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The  data  provided  by  a single  remote  sensing  instrument  may  be  insuffieient  for  many 
precision  mapping  applications.  Additional  sources  may  provide  complementary  or 
redundant  data  which  can  be  used  to  improve  the  information  extraction  process.  The 
fusion  of  multisource  data  can  improve  accuraey  and  ereate  more  consistent  recognition 
of  land  cover  patterns. 

The  objective  of  this  research  is  to  investigate  possible  strategies  for  the  fusion  of 
airborne  laser  data  with  passive  optical  data  for  object  space  classification.  A significant 
contribution  of  our  work  is  the  development  and  implementation  of  a data-level  fusion 
technique,  direct  digital  image  georeferencing  (DDIG).  In  DDIG,  we  use  navigation  data 
from  an  integrated  system  (composed  of  global  positioning  system  (GPS)  and  inertial 
measurement  unit  (IMU))  to  project  three-dimensional  data  points  measured  with  the 
University  of  Florida’s  airborne  laser  swath  mapping  (ALSM)  system  onto  digital  aerial 
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photographs.  As  an  underlying  math  model,  we  use  the  familiar  eollinearity  condition 
equations.  After  matching  the  ALSM  object  space  points  to  their  corresponding  image 
space  pixels,  we  resample  the  digital  photographs  using  cubic  convolution  techniques. 
We  call  the  resulting  images  pseudo-ortho-rectified  images  (PORI)  because  they  are 
orthographic  at  the  ground  surface  but  still  exhibit  some  relief  displacement  for  elevated 
objects;  and  because  they  have  been  resampled  using  a interpolation  technique.  Our 
accuracy  tests  on  these  PORI  images  show  that  they  are  planimetrically  correct  to  about 
0.4  meters.  This  accuracy  is  sufficient  to  remove  most  of  the  effects  of  the  central 
perspective  projection  and  enable  a meaningful  fusion  of  the  RGB  data  with  the  height 
and  intensity  data  produced  by  the  laser.  PORI  images  may  also  be  sufficiently  accurate 
for  many  other  mapping  applications,  and  may  in  some  applications  be  an  attractive 
alternative  to  traditional  photogrammetric  techniques. 

A second  contribution  of  our  research  is  the  development  of  several  strategies  for  the 
fusion  of  data  from  airborne  laser  and  camera  systems.  We  have  conducted  our  work 
within  the  sensor  fusion  paradigm  formalized  in  the  optical  engineering  community.  Our 
work  explores  the  fusion  of  these  two  types  of  data  for  precision  mapping  applications. 

Specifically,  we  combine  three  different  types  of  data:  the  high  resolution  color 
images,  the  lower  resolution  near  infrared  (NIR)  intensity  images,  and  digital  elevation 
model  (DEM).  We  then  investigate  the  use  of  a supervised  statistical  pattern  recognition 
technique  to  combine  these  data  for  land-cover  classification.  We  also  investigate  two 
decision-level  data  fusion  algorithms:  an  expert  system  and  an  approach  based  on 
Dempster-Shafer  evidential  theory. 
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A common  study  area  was  selected  for  these  three  data  fusion  techniques,  and  the 
same  reference  data  were  used  to  complete  the  divergence  analyses  for  the  classification 
maps.  The  highest  overall  accuracy  of  supervised  classification  using  the  maximum 
likelihood  algorithm  is  91.32%  in  red,  blue,  height,  and  intensity  (RBHI)  combination, 
the  overall  accuracy  of  rule  based  classification  is  92.74%  and  the  overall  accuracy  of 
Dempster-Shafer  approach  is  94.24%. 

In  this  research,  we  demonstrate  that  data  fusion  can  be  a powerful  tool  for  the  image 
analyst  wishing  to  take  advantage  of  object  space  classification.  The  results  show  that  the 
fusion  of  data  from  different  sources  can  share  redundant  and  complementary 
information,  and  also  provide  greater  classification  detail  and  accuracy. 
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CHAPTER  1 
INTRODUCTION 

In  the  mapping  sciences,  the  ultimate  goal  is  to  completely  describe  the  object  space 
with  regard  to  the  type  and  location  of  the  individual  objects  within  it,  and  to  represent 
this  understanding  in  an  iconic  form  as  a map.  Two  approaches  are  widely  used.  When 
high  planimetric  accuracy  or  elevation  data  are  required,  maps  are  produced  using 
photogrammetric  techniques.  For  small  scale  or  land-cover  mapping  projects,  automated 
segmentation  and  classification  algorithms  are  applied  to  passive  spectral  data  in  order  to 
produce  classification  maps. 

Over  the  past  decade,  the  differences  between  these  two  approaches  have  become  less 
distinct.  With  the  development  of  softcopy  workstations,  photogrammetrists  have 
adopted  machine  vision  techniques  that  support  object  recognition  and  classification 
(Schenk,  1999),  and  the  spatial  resolution  of  spectral  images  has  improved  to  the  level 
where  these  data  can  be  used  to  detect  and  accurately  map  small  objects  (Boardman, 
1999).  In  addition  to  these  developments,  other  technologies  have  begun  to  compete  with 
photogrammetry.  For  example,  the  rapid  evolution  of  two  active  sensing  technologies, 
light  detection  and  ranging  (LIDAR)  and  synthetic  aperture  radar  (SAR),  have  made  it 
possible  to  produce  three-dimensional  data  directly,  and  to  simultaneously  generate  two- 
dimensional  images  of  the  intensity  of  the  return  signal. 

Over  the  past  five  years,  the  use  of  airborne  laser  mapping  technology  for  the 
generation  of  digital  elevation  models  has  become  commonplace  (Carter  & Shrestha, 
2001).  Recently,  a few  researchers  have  begun  to  investigate  the  use  of  LIDAR 
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“intensity”  for  mapping  or  object  recognition  studies  (Park  et  al.,  2001).  But  the 
classification  of  the  object  space  is  still  best  achieved  through  the  analysis  of  multiple 
channels  of  passive  spectral  data.  For  example,  inexpensive  digital  cameras  can  be  used 
to  generate  three-channel  imagery  with  pixel  sizes  on  the  order  of  several  decimeters 
(Park  et  al.,  2002).  These  digital  photographs  can  be  classified  using  familiar  pattern 
recognition  techniques  (e.g.,  clustering  algorithms)  to  generate  useful  classifications. 
Also,  imaging  spectrometers  have  been  deployed  on  low-flying  aircraft  to  generate 
hyperspectral  data  having  spatial  resolution  on  the  order  of  one  meter  (Tuell  et  al.,  2000). 
These  data  can  be  analyzed  using  vector-based  algorithms  to  detect  the  presence  of 
objects,  even  at  the  sub-pixel  level  (e.g..  Smith,  1990;  Harsanyi,  1993;  Hoffbeck  and 
Landgrebe,  1996).  Clearly,  the  use  of  a combination  of  spectral  data  with  geometric  data 
ought  to  support  a more  robust  description  of  the  object  space. 

The  topic  of  data  fusion  is  of  broad  interest  within  a number  of  disciplines.  It  is  highly 
developed  within  the  optical  engineering  community,  where  work  has  focused  on  military 
applications  related  to  automated  target  recognition  (Klein,  1993).  Also,  researchers  in 
the  medical  imaging  community  have  applied  it  to  the  segmentation  of  MRI  images  (Zhu 
et  al.,  2002).  In  the  machine  vision  community,  researchers  have  demonstrated 
improvements  in  the  ability  of  robots  to  model  the  environment  (Binford,  1982).  In  the 
physical  sciences,  related  work  is  often  published  under  the  topic  of  data  assimilation. 
Recent  work  in  that  field  employs  techniques  to  estimate  a dynamic,  high-dimensional 
state  vector  which  describes  the  temporal  characteristics  of  the  global  ocean,  by  merging 
information  computed  individually  from  multiple  sensors  (Stammer  et  al.,  2002). 
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In  the  mapping  sciences,  a few  researchers  have  already  reported  on  the  fusion  of 
remotely  sensed  data.  For  example,  Izraelevitz  (1994),  Wilson  et  al.  (1995),  and  Yocky 
(1996)  have  demonstrated  the  ability  to  generate  new  image  products  by  sharpening  the 
resolution  of  a spectral  image  with  a higher  resolution  panchromatic  image.  Wald, 
Ranchin,  and  Mangolini  (1997)  have  similarly  proposed  methods  to  produce 
multispectral  images  with  enhanced  spatial  resolution  using  one  or  more  images  of  the 
same  scene  of  better  spatial  resolution.  They  also  measured  the  performance  of  a method 
to  synthesize  the  radiometry  in  a single  spectral  band,  as  well  as  the  multispectral 
information,  when  increasing  the  spatial  resolution.  Haack  and  Bechdol  (1999)  evaluated, 
independently  and  in  combination,  the  relative  utility  of  merging  traditional  spacebome 
optical  data  from  the  visible  and  infrared  wavelengths  with  radar  data.  Others  have 
demonstrated  the  use  of  digital  elevation  data  for  the  correction  of  geometric, 
topographic,  and  atmospheric  effects  on  radiance  of  spectral  data  (e.g.  Conese  et  al., 

1993;  Crippen,  1987). 

Importantly,  a* few  examples  showing  improvements  in  object  recognition  and 
classification  have  also  been  reported  (Haala,  1994  and  Kim  et  al.,  1995).  These 
experiments  have  shown  the  potential  of  merging  panchromatic  images  and  digital 
elevation  data  for  the  auto-extraction  of  buildings  in  urban  scenes.  Madhok  and 
Landgrebe  (1999)  showed  that  combining  hyperspectral  and  DEM  data  could 
substantially  sharpen  the  identification  of  building  boundaries,  reduce  classification  error, 
and  lessen  dependence  on  the  analyst  for  classifier  construction.  These  experiments  have 
great  potential.  They  offer  renewed  hope  of  progress  toward  the  elusive  goal  of 
automating  the  mapping  task.  But  even  as  the  capabilities  of  sensor  technology  and  the 
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interest  in  sensor  fusion  have  increased,  we  lack  a fundamental  overview  of  the  strategies 
by  which  we  might  implement  fusion  in  the  mapping  sciences,  and  the  knowledge  as  to 
which  strategies  may  be  preferred  for  certain  specific  tasks.  In  this  research,  we  seek  to 
address  these  questions. 

In  most  cases,  the  data  provided  by  a single  sensor  are  incomplete,  inconsistent,  or 
imprecise  for  object  recognition.  Additional  sources  may  provide  complementary  or 
redundant  data,  which  leads  to  the  question:  Can  we  use  data  fusion  techniques  to 
increase  our  ability  to  extract  information  from  the  measured  data?  A few  researchers  (for 
example,  Le  Hegarat-Mascle  et  al.,  1997;  Ehlers,  1991)  have  demonstrated  the  use  of 
redundant  data  for  improving  imprecision.  They  have  also  shown  the  fusion  of 
complementary  data  to  be  useful  in  creating  more  consistent  recognition  of  land  cover 
patterns.  But,  what  is  the  best  strategy  for  a given  goal?  In  our  research,  we  attempt  to 
address  this  topic  with  regard  to  a contemporary  problem  in  the  mapping  sciences:  how  to 
best  merge  passive  reflected  spectral  data  with  height  and  intensity  data  from  airborne 
laser  systems. 

Objective  of  Research 

The  objective  of  this  research  is  to  investigate  possible  strategies  for  the  fusion  of  data 
from  an  advanced  mapping  system  developed  at  the  University  of  Florida.  The  system  is 
composed  a model  ALTM  1210  topographic  LIDAR  manufactured  by  Optech  Inc.,  and  a 
Kodak  model  420  color  digital  camera.  The  high  resolution  digital  color  images,  high 
accuracy  digital  elevation  data,  and  the  near  infrared  intensity  image  from  the  laser  are 
measured  simultaneously.  This  provides  a cost  and  time  efficient  data  acquisition  system 
for  mapping,  ground  object  recognition,  and  land-cover  classification. 
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In  any  data  fusion  experiment,  the  proper  registration  of  the  data  sets  is  of  critical 
importance.  As  a preprocessing  step  in  our  work,  we  developed  a direct  digital  image 
georeferencing  (DDIG)  procedure  to  coregister  the  spectral,  elevation,  and  laser  intensity 
data.  This  procedure  solves  the  important  problem  of  converting  the  central  perspective 
of  the  digital  photographs  to  an  orthographic  image,  which  can  be  properly  merged  with 
the  topographic  measurements  from  the  laser  data.  We  then  investigate  the  use  of  a 
supervised  statistical  pattern  recognition  technique,  maximum  likelihood  classifier,  and 
two  decision-level  fusion  techniques:  an  expert  system,  and  Dempster-Shafer  evidential 
theory  for  the  automated  extraction  of  information  from  these  data.  Specifically,  we 
produce  land-cover  maps  of  the  project  site  using  accepted  remote  sensing  classification 
methods,  but  we  have  applied  them  to  multisource  data. 

Data  Fusion 

Conceptually,  fusion  is  a simple  procedure  that  provides  for  the  combination  of 
multiple  sources  for  the  purpose  of  object  space  identification.  Several  introductory  texts 
have  been  offered  [for  example.  Hall  (1992),  Crowely  & Demazeau  (1993),  Klein  (1993), 
Bloch  (1996),  Dasarathy  (1997),  and  Hall  & Llinas  (1997)].  Also,  a number  of  excellent 
papers  have  provided  surveys  of  multisensor  integration  and  fusion.  Mitiche  and 
Aggarwal  (1986)  provide  a thorough  summary  of  the  issues  involved  in  fusing  images 
from  multiple  dissimilar  sensors.  An  example  is  given  of  the  complementary  information 
contained  in  the  visual,  thermal,  and  range  images  of  a simple  multisensor  system  where 
the  individual  observations  are  acquired  by  various  sensors.  These  observations  are 
functions  of  unknown  object  parameters.  The  task  is  to  determine  the  parameters  by 
evaluation  of  the  observations  using  various  methods  of  data  combination  strategies.  Dai 
and  Khorram  (1999)  express  a simple  data  fusion  process  in  a mathematical  framework 
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by  breaking  it  into  separate  parts,  as  shown  in  Figure  1-1.  Each  of  the  processors  [Pj, 

P2,  ....  Pn]  classifies  single  source  data  [xi,  X2,  ....  x„]  and  then  produces  either  decisions 
or  class  probabilities  \yi,  y2,  ....  yn\-  Based  on  the  information  provided  by  [y/,  y2,  y«], 

the  fusion  center  F performs  overall  classification  and  combines  pieces  of  information 
from  n sources  in  some  way  to  obtain  a global  output  0\ 

0=F(yi,y2,  ...,yn)  (1-1) 

In  the  image  classification  problem,  the  global  output  O is  the  final  class  that  is  the 
information  about  the  object  space.  It  is  more  accurate  than  that  achieved  with  a single 
type  of  data,  even  in  the  presence  of  conflicting  data. 

To  simplify  the  parameter  estimation  process  in  a single  data  source,  data  are  usually 
modeled  using  a multivariate  Gaussian  expression  of  probability.  In  data  fusion 
applications,  such  models  may  be  difficult  to  construct  since  data  from  different  sources 
have 


Figure  1-1.  General  architecture  of  the  data  fusion  process  in  image  classification  (Dai, 
X.  & S.  Khorram,  1999). 
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different  modalities,  and  the  correlations  are  sometimes  very  complex.  As  the  data 
increase  in  dimension  and  complexity,  the  number  of  model  parameters  also  increases. 

Based  on  this  simple  procedure,  one  possible  way  to  implement  data  fusion  is  to  make 
separate  classifications  based  on  information  from  each  source  and  then  integrate  these 
decisions.  This  approach  may  be  applicable  in  our  research  where  different  data  layers, 
such  as  digital  color  images,  intensity  data,  and  DEM,  of  the  same  geographic  areas  from 
different  data  sources  (ALSM  and  ADP  systems)  are  available,  and  only  one  information 
layer  is  to  be  derived. 

An  Object  Recognition  Example 

To  illustrate  this  approach  to  fusion,  we  provide  a simple  example  of  the  combination 
of  spectral  reflectance  data  with  height  data  for  object  recognition.  Four  objects  are 
shown  in  Figure  1-2  (a).  They  are  distinguished  by  two  independent  features:  chlorophyll 
absorption  feature  which  exists  in  the  reflection  spectrum,  and  the  elevation.  Sensor  1 
provides  information  concerning  the  chlorophyll  absorption  of  the  object,  and  Sensor  2 
provides  information  concerning  its  elevation.  Figure  1-2  (b)  shows  hypothetical 
frequency  distributions  for  “vegetation”  and  “non- vegetation”  objects  (estimated  from  the 
chlorophyll  absorption  feature),  representing  the  sensor’s  tested  responses  to  such 
objects.  The  bottom  axes  of  the  figures  represent  the  range  of  possible  sensor  readings. 
The  output  values  x\  correspond  to  some  numerical  “degree  of  vegetation  or  non- 
vegetation” of  the  objeet,  as  determined  by  the  sensor.  Because  Sensor  1 is  not  able  to 
detect  the  elevation  of  an  object,  objects  A and  C (as  well  as  B and  D)  cannot  be 
separated.  The  dark  portion  of  the  axis  in  the  figure  corresponds  to  the  range  of  output 
values  where  there  is  uneertainty  as  to  the  chlorophyll  absorption  of  the  object  being 
detected.  Figure  1-2  (c)  shows  the  frequency  distribution  for  “high”  and  “low”  objects. 
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A:  Tree  B:  Concrete  Building 

C:  Grass  D:  Concrete  Sidewalk 

(a)  Four  Objects 
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(d)  Sensors  1 and  2 


Figure  1-2.  The  discrimination  of  four  different  objects  using  complementary  information 
from  two  sensors:  (a)  Four  objects  (A,  B,  C,  and  D)  distinguished  by  the  feature 
“chlorophyll  absorption”  (vegetation  vs.  non-vegetation)  and  “elevation”  (high  vs.  low); 
(b)  1-D  distributions  from  Sensor  1 (chlorophyll  absorption);  (c)  1-D  distributions  from 
Sensor  2 (elevation);  (d)  3-D  distributions  resulting  from  fusion  of  complementary 
information  from  Sensor  1 (chlorophyll  absorption),  and  Sensor  2 (elevation)  [modified 
from  Abidi  and  Gonzalez  (1992)]. 


resulting  from  X2.  Because  Sensor  2 is  able  to  detect  the  elevation  of  an  object,  objects  A 
and  C (as  well  as  B and  D)  can  be  distinguished.  In  Figure  1-2  (d),  complementary 
information  from  Sensor  2 concerning  the  independent  feature  elevation,  as  shown  in 
figure  1-2  (c),  is  fused  with  the  chlorophyll  absorption  information  from  Sensor  1 shown 
in  Figure  1-2  (b).  As  a result  of  the  fusion  of  both  features,  it  is  now  possible  to 
discriminate  among  all  four  objects.  This  increase  in  discrimination  ability  is  one  of  the 
advantages  resulting  from  the  fusion  of  complementary  information.  Importantly,  the 
information  resulting  from  this  fusion  technique  could  be  at  a higher  representational 
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level.  The  output  values  x\  and  xi  are,  for  example,  a numerical  value,  whereas  the  result 
of  the  fusion  could  be  a symbol  representing  one  of  the  four  possible  objects.  This 
illustrates  an  important  goal  of  our  approach  to  fusion:  we  wish  to  move  from  data  to 
information  as  accurately  as  possible. 

Potential  Advantages  in  Integrating  Multiple  Sensors 

The  potential  advantages  in  fusing  data  from  multiple  sensors  are  the  possibility  to 
obtain  more  accurate  information  concerning  features  that  cannot  be  perceived  with 
individual  single  sensors,  as  well  as  less  time  and  a lesser  cost.  Abidi  and  Gonzalez 
(1992)  stated  that  these  advantages  correspond  to  the  notions  of  redundancy, 
complementarity,  timeliness,  and  cost  of  the  information  provided  by  the  system.  Since 
redundant  information  can  more  easily  be  made  commensurate,  it  can  usually  be  fused  at 
a lower  level  of  representation  compared  to  complementary  information.  Complementary 
information  is  usually  fused  at  a symbolic  level  of  representation,  but  sometimes  it  is 
provided  directly  to  different  parts  of  the  system  without  being  fused.  In  most  cases,  the 
advantages  gained  through  the  use  of  redundant,  complementary,  or  more  timely 
information  in  a system  are  related  to  technological  benefits. 

The  Methods  of  Data  Fusion 

Multiple  heuristic  and  analytical  techniques  for  data  fusion  have  appeared  in  the 
literature  during  the  last  20  years.  Fusion  techniques  have  also  been  used  in  land  cover 
classification  since  the  late  1980s.  From  the  literature,  however,  no  comprehensive 
treatment  of  the  various  strategies  has  been  presented,  and  no  optimal  fusion  technique 
has  yet  been  proposed. 

Some  researchers,  such  as  Abidi  and  Gonzalez  (1992),  Pohl  and  Van  Genderen 
(1998),and  Tuell  et  al.  (2001),  have  generated  taxonomies  which  consider  different  levels 
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of  representation  at  which  the  fusion  of  the  data  or  information,  from  multiple  sensors  or 
a single  sensor  over  time,  can  take  place.  Based  on  where  the  combination  takes  place  in 
the  information  extraction  process,  data  fusion  may  be  performed  at  different  stages: 
signal,  pixel,  feature,  and  decision  level.  In  Figure  1-3,  data-,  feature-,  and  decision-level 
fusions  are  shown  (Tuell  et  al.,  2001).  The  signal  and  pixel  level  fusion  have  been 
replaced  with  a single  data-level  fusion. 

Dai  and  Khorram  (1999)  briefly  explained  four  stages  in  their  monograph:  Signal- 
level  fusion  can  be  used  in  real-time  applications  and  can  be  considered  as  an  additional 
step  in  the  overall  processing  of  the  signals;  pixel-level  fusion  can  be  used  to  improve  the 
performance  of  many  image  processing  tasks,  such  as  segmentation;  feature  level  fusion 
is  performed  at  an  intermediate  level  based  on  features  detected;  at  the  decision  level, 
monosource  classification  results  are  combined  based  on  the  predefined  degree  of  belief 
using  some  combination  operators.  Here,  we  follow  the  concept  of  different  levels  of  data 
fusion,  suggested  by  Abidi  and  Gonzalez  (1992). 

Signal-Level  Methods 

Signal-level  fusion  refers  to  the  combination  of  the  signals  of  a group  of  sensors.  The 
objective  of  signal-level  fusion  is  to  provide  a signal  that  is  usually  of  the  same  form  as 
the  original  signals  but  of  greater  quality.  As  compared  to  other  types  of  fusion,  signal- 
level  fusion  requires  the  greatest  degree  of  sensor  registration.  To  perform  signal-level 
fusion  with  multiple  sensors,  their  signals  must  be  in  temporal,  as  well  as  spatial, 
registration.  If  the  signals  from  the  sensors  are  not  synchronized,  their  values  at  common 
points  of  time  need  to  be  estimated  to  put  them  into  temporal  registration. 

To  develop  optimal  signal-level  fusion  methods,  certain  assumptions  concerning  the 
nature  of  the  sensory  information  must  be  satisfied.  The  most  common  assumption  is  the 
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use  of  a measurement  model  for  the  information  from  each  sensor  that  includes  a 
statistically  independent  additive  Gaussian  error  or  noise  term  (for  example,  location 
data).  A related  assumption  is  the  statistical  independence  between  the  error  terms  for 
each  sensor.  Richardson  and  Marsh  (1988)  provided  an  excellent  introduction  to  the 
conceptual  problems  inherent  in  any  signal-level  fusion  method  based  on  these  common 
assumptions.  In  their  monograph,  they  provided  proof  that  the  inclusion  of  additional 
redundant  sensory  information  almost  always  improves  the  performance  of  any  signal- 
level  fusion  method  that  is  based  on  optimal  estimation 
Pixel-Level  Methods 

When  applied  to  image  data,  signal-level  fusion  is  often  called  pixel-level  fusion. 
Pixel-level  fusion  can  be  used  to  increase  the  information  content  in  each  pixel  in  an 
image  formed  through  a combination  of  multiple  images.  For  example,  the  fusion  of  a 
DEM  image  with  a two-dimensional  intensity  image  adds  height  information  to  each 
pixel  in  the  intensity  image,  which  can  be  useful  in  the  subsequent  processing  of  the 
image.  The  most  obvious  candidates  for  pixel-level  fusion  include  sequences  of  images 
from  a single  imaging  sensor  (for  example,  a multispectral  camera)  and  images  from  a 
group  of  identical  sensors  (for  example,  stereo  vision).  The  fused  image  can  be  created 
through  pixel-by  pixel  fusion  in  each  of  the  component  images.  The  information 
associated  with  each  pixel  in  a component  image  can  be  considered  as  an  additional 
dimension  of  the  information  associated  with  its  corresponding  pixel  in  the  fused  image 
(for  example,  the  two  dimensions  of  height  and  intensity  associated  with  each  pixel  in  a 
fused  height-intensity  image). 

In  order  for  pixel-level  fusion  to  be  feasible,  the  data  provided  by  each  sensor  must  be 
registered  and,  in  most  cases,  must  be  sufficiently  similar  in  terms  of  its  resolution  and 
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information  content.  Sensor  registration  is  not  a problem  when  a single  sensor  is  used. 
However,  when  multiple  sensors  are  used,  each  sensor  should  be  able  to  provide  images 
of  the  same  resolution.  If  the  images  to  be  fused  have  different  resolutions,  then  a 
resampling  of  images  can  be  applied  to  have  the  same  resolution  or  a mapping  must  be 
specified  between  corresponding  regions  in  the  images. 

The  sensors  used  for  pixel-level  fusion  need  to  be  accurately  coaligned  so  that  their 
images  will  be  in  spatial  registration.  This  is  usually  achieved  through  locating  the 
sensors  on  the  same  platform. 

The  fiision  of  multisensor  data  at  the  pixel  level  can  serve  to  increase  the  information 
content  of  an  image  so  that  more  reliable  segmentation  can  take  place  and  more 
discriminating  features  can  be  extracted  for  further  processing. 

Feature-Level  Methods 

Feature-level  fusion  can  be  used  to  increase  the  likelihood  that  a feature  extracted 
from  the  data  provided  by  a sensor  actually  corresponds  to  an  important  aspect  of  the 
object  space.  This  fusion  also  can  be  used  as  a means  of  creating  additional  composite 
features  for  use  by  the  system.  Features  provide  for  data  abstraction.  A “primary”  feature 
is  created  through  the  results  of  the  processing  of  some  spatial  and/or  temporal  segment 
of  sensory  data  with  some  type  of  semantic  meaning,  whereas  a “composite  feature”  is 
created  through  a combination  of  existing  features.  Typical  features  extracted  from  an 
image  and  used  for  fusion  include  edges  and  regions  of  similar  intensity  or  depth.  When 
multiple  sensors  report  similar  features  at  the  same  location  in  the  environment,  the 
likelihood  of  the  features  that  are  actually  present  are  increased,  and  the  accuracy  with 
which  they  are  measured  may  be  improved.  Features  without  such  support,  however,  can 
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Figure  1-3.  Data-,  feature-,  and  decision-level  fusion  (Tuell  et  al.,  2001) 
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be  considered  as  spurious  artifacts  and  eliminated.  A feature  created  as  a result  of  the 
fusion  process  may  be  either  a composite  of  the  component  features  (for  example,  an 
edge  that  is  composed  of  segments  of  edges  detected  by  different  sensors)  or  an  entirely 
new  type  of  feature  that  is  composed  of  the  attributes  of  its  component  features  (for 
example,  a three-dimensional  edge  formed  through  the  fusion  of  corresponding  edges  in 
the  images  provided  by  stereo  cameras). 

As  compared  to  signal-  and  pixel-level  fusion,  the  sensor  registration  requirements  for 
feature-level  fusion  are  less  strict.  The  geometric  transformation  of  a feature  can  be  used 
to  bring  it  into  registration  with  other  features.  The  geometrical  form,  orientation,  and 
position  of  a feature,  together  with  its  temporal  extent,  are  the  most  important  aspects  of 
the  feature  that  need  to  be  registered  and  fused  with  other  features. 

Decision-Level  Methods 

Decision-level  fusion  has  the  highest  level  of  abstraction  to  allow  the  information 
from  multiple  sensors  to  be  effectively  used  together.  If  the  sensors  are  very  dissimilar  or 
refer  to  different  regions  of  the  environment,  decision-level  fiision  may  be  the  only  means 
by  which  sensory  information  can  be  fused.  Decision-level  fusion  is  sometimes  termed 
“symbol-level  fusion.”  A symbol  derived  from  sensory  information  is  a representation  of 
a decision  that  has  been  made  concerning  some  aspect  of  the  environment.  Sensor 
registration  is  normally  not  explicitly  considered  in  the  generation  of  the  symbol.  If  the 
symbols  to  be  fused  are  not  in  registration,  spatial  and  temporal  attributes  associated  with 
symbols  can  be  used  for  their  registration. 

Decision-level  fusion  uses  different  forms  of  logical  and  statistical  inference.  In 
logical  inference,  the  individual  symbols  to  be  fused  represent  terms  in  logical 
expressions,  and  their  uncertainty  measures  correspond  to  the  true  values  of  the  term.  In 
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statistical  inference,  the  individual  symbols  to  be  fused  are  represented  as  conditional 
probability  expressions,  and  the  uncertainty  measures  represent  the  probability  measures 
associated  with  the  expressions.  The  improvement  in  quality  associated  with  decision- 
level  fusion  can  be  represented  by  the  increase  in  the  truth  or  probability  values  of  the 
symbol  created  as  a result  of  the  inference  process. 

Henkind  and  Harrison  (1988)  have  compared  and  analyzed  the  uncertainty  used  in 
four  decision-level  fusion  techniques:  Bayesian  inference,  Dempster- Shafer  method, 
fuzzy  set  theory,  and  production  rule-based  systems  using  confidence  factors.  The 
computational  complexity  of  these  techniques  is  compared,  and  their  underlying 
assumptions  are  made  explicit.  Cheng  and  Kashyap  (1988)  have  compared  the  use  of 
Bayesian  inference  and  the  Dempster-Shafer  method  for  evidence  combination. 

Of  these  several  methods,  the  production  rule  in  expert  system  method  and  Dempster- 
Shafer  method  are  summarized  in  this  section. 

1 . Rule-based  algorithm:  the  production  rule-based  method  has  also  been  called  an 
expert  system  or  knowledge-based  system  for  image  interpretation.  Rules  utilize  one  of 
two  forms:  production  rules  asserting  antecedent-consequence  relations  or  logic  rules 
asserting  consequence-antecedent  relations.  Rules  consist  of  premise-action  pairs,  for 
example: 

If  P,  & ...  &P„, 

Then  Qi  & ...  & Q„. 

with  the  reading  “if  premises  Pi  and  . . . and  P„  are  true,  then  perform  actions  Q\  and  . . . 
and  Q„.”  The  P,  are  sometimes  called  “conditions”  and  the  “conclusions,”  since  the 
most  common  action  is  to  conclude  that  a certain  proposition  is  true-often  with  some 
degree  of  confidence  (Jackson,  1986). 
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The  main  drawback  of  this  approach  is  its  strong  dependence  on  domain  knowledge 
and  the  rigidity  of  rules.  Rule-based  image  interpretation  systems  tend  to  be  nonrobust 
and  context  specific.  In  large-scale  expert  systems,  hundreds  or  thousands  of  rules  may 
be  required  to  adequately  represent  the  required  expertise.  The  rule-based  algorithm  and 
literature  review  will  be  described  with  experiments  in  Chapter  3. 

2.  Dempster-Shafer  method:  Dempster  and  Shafer  introduced  a generalization  of  the 
Bayesian  inference  method  that  allows  for  a general  level  of  uncertainty  in  a book 
entitled^  Mathematical  Theory  of  Evidence  (Shafer,  1976;  Lawrance  and  Garvey,  1982; 
Hall,  1992).  The  Dempster-Shafer  technique  updates  an  a priori  mass  assignment 
function  to  obtain  an  a posteriori  evidential  interval.  The  evidential  interval  quantifies  the 
measure  of  belief  of  a proposition,  and  its  plausibility  (lack  of  evidence  refuting  the 
hypotheses).  Mass  assignment  functions  provide  the  analogue  to  the  Bayesian 
probability.  The  Dempster-Shafer  method  relaxes  the  Bayesian  restriction  on  mutually 
exclusive  hypotheses  by  assigning  evidence  to  propositions  rather  than  hypotheses  (Hall, 
1992).  In  addition,  the  Dempster-Shafer  technique  provides  a general  level  of 
uncertainty;  that  is,  it  does  not  require  an  exhaustive  set  of  hypotheses  to  be  defined. 

In  this  research,  the  potential  of  data  fusion  strategies  was  investigated  for  high- 
resolution  land  cover  mapping.  First,  the  traditional  and  well  known  maximum  likelihood 
classification  algorithm  was  applied  to  the  fused  data  set,  and  then  the  application  of  two 
decision-level  fusion  techniques,  Dempster-Shafer’ s method  and  rule-base  algorithm 
were  investigated,  respectively.  More  detail  about  this  theory,  literature  review,  and 
experiments  will  be  presented  in  Chapter  4. 
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For  the  data  fusion  research,  we  combined  the  high  resolution  digital  color  images 
from  the  ADP  system,  the  high  accuracy  Digital  Elevation  Model  (DEM),  and  the  near 
infrared  intensity  image  from  the  ALSM  system 

Data  Acquisition 

A test  site  was  established  at  the  University  of  Florida  in  Gainesville,  and  data  were 
acquired  on  October.  24,  2000.  The  ALSM  data  and  digital  color  images  were  acquired 
simultaneously  by  a Cessna  337  Skymaster  twin-engine  aircraft.  The  parameters  used  for 
this  project  are  shown  in  Table  1-1.  The  intensity  image  from  the  ALSM  system  for  this 
research  is  shown  in  Figure  1-4,  and  the  corresponding  DEM  from  the  ALSM  system  is 
shown  in  Figure  1-5. 

The  ALSM  data  were  collected  at  a pulse  repetition  rate  of  10,000  points  per  second 
and  processed  with  precise  ephemeris  global  positioning  system  (GPS)  data.  Direct 
measurements  of  sensor  pitch,  roll,  and  heading  were  used  to  compute  the  positions  of  the 
individual  laser  pulses.  The  intensity  image  and  DEM  were  generated  using  commercial 
mapping  software  packages.  Figure  1-6  shows  the  digital  color  image  that  was  acquired 
almost  simultaneously  with  ALSM  data. 


Table  1-1.  Laser  System  Parameters 


Parameter 

October  24, 2000 

Pulse  Repetition  Rate 

10  kHz 

Scan  Angle 

20  ° 

Scan  Rate 

15  Hz 

Flying  Height 

600  meters 

Speed 

46.3  m/sec 
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Figure  1-4.  The  intensity  image  from  the  ALSM  data. 


Figure  1-5.  The  elevation  image  from  the  ALSM  data. 
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Figure  1-6.  Digital  color  image  from  ADP  system  from  the  ALSM 

Outline 

Chapter  2 explains  the  use  of  a supervised  classification  to  the  combined  ALSM 
DEM,  intensity,  and  ADP  digital  color  images  for  land  cover  classification.  The 
maximum  likelihood  classifier,  which  is  the  most  common  technique,  was  applied  to 
several  different  band  combinations  consisting  of  five  different  data  sources;  red,  green, 
blue,  near  infrared  intensity,  and  DEM. 

Chapter  3 describes  the  expert  system  for  multisensor  data  fusion  for  image 
interpretation  and  a literature  review.  The  two-band  image  consisting  of  near  infrared 
images  and  elevation  data,  which  we  named  an  HI  image,  and  the  three-band  digital  color 
images  from  the  ADP  system  were  classified  using  level- 1 classification  rules.  Then  two 
different  rule-based  classification  images  were  fused  using  the  level-2  classification  rules. 
Within  the  level-2  classification,  the  data  from  each  of  the  level- 1 classifications  is  used 
to  derive  information  for  object  space  classification.  We  will  show  how  the  rules  were 
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created  in  both  classification  levels  and  how  the  initial  data  were  evaluated  to  create  the 
rules. 

Chapter  4 presents  the  literature  review  of  the  Dempster-Shafer  evidential  theory  and 
investigates  the  problem  of  how  to  combine  data  from  multiple  sensors.  The  HI  image 
from  the  ALSM,  and  the  digital  color  images  from  the  ADP  system  were  combined  by 
Dempster’s  combination  rules  of  the  theory.  The  acquisition  of  a priori  probabilities  for 
this  approach  is  a key  problem,  and  we  will  explain  how  we  handled  this  problem  using 
the  maximum  likelihood  classifier. 

Chapter  5 contains  the  conclusions.  We  will  compare  three  different  results  from  the 
supervised  statistical  pattern  recognition  technique,  the  expert  system,  rule-based, 
classification,  and  the  classification  using  the  Dempster-Shafer  evidential  theory. 

Chapter  6 discusses  future  research.  We  will  show  how  the  results  from  this  work  can 
be  used  for  additional  research. 

In  order  to  make  this  document  self-contained,  we  summarized  several  important 
concepts  and  procedures  in  the  appendices.  In  Appendix  A,  we  describe  important 
characteristics  of  the  University  of  Florida  Airborne  Laser  Swath  Mapping  (ALSM) 
system  and  the  Airborne  Digital  Photography  (ADP)  system.  In  Appendix  B,  we  present 
the  procedure  used  in  our  work  to  coregister  the  digital  color  image  from  the  ADP  system 
with  the  HI  image  from  the  ALSM  system.  In  Appendix  C,  we  explain  the  characteristic 
of  intensity  data  from  the  ALSM  system,  and  the  correlation  between  intensity  and  target 
reflectance.  In  Appendix  D,  we  provide  a brief  overview  of  the  maximum  likelihood 
classification.  Our  work  in  Chapters  2 and  4 depends  on  this  material.  In  Appendix  E,  we 


if- 

VJ 
> ■ 

f 


’•  ,!2i, 

. -*A 

t 

..A*  ^ 

give  a brief  overview  of  the  approach  to  accuracy  assessment  in  multispectral  remotely 
sensed  data. 


■■  ■ ■?: 


V- 


I 


• ? • . 


*■5 


CHAPTER  2 

CLASSIFICATION  USING  A SUPERVISED  STATISTICAL  PATTERN 
RECOGNITION  TECHNIQUE 

Image  interpretation  or  classification  has  been  an  active  area  of  research  in  remote 
sensing  for  decades.  Most  researchers  have  applied  basic  algorithms  of  supervised  and 
unsupervised  classification.  But  these  approaches  may  not  be  satisfactory  for  sensor 
fusion.  In  this  work,  the  feature-level  fusion  of  two  sets  of  data  from  different  sources 
was  investigated  to  classify  ground  objects  using  a supervised  statistical  pattern 
recognition  technique.  In  this  chapter,  we  describe  the  classification  using  the  maximum 
likelihood  classifier  to  the  combined  data,  including  DEM,  intensity,  and  digital  color 
images,  for  land  cover  classification. 

In  the  following  discussion,  the  fusion  of  the  ALSM  data  with  digital  color 
photographs  will  be  performed,  and  the  intensity  signal  of  ALSM  will  be  tested  as  an 
infrared  source. 

Data  Fusion  of  Digital  Color  Imagery,  ALSM  Intensity,  and  DEM 

In  this  research,  we  integrated  an  HI  band  (DEM  and  intensity)  image  from  the  ALSM 
system,  and  a high-resolution  RGB  digital  color  photograph  image  from  the  ADP  system. 
In  Figure  2-1,  feature-level  fusion  is  shown  (Tuell  et  al.,  2001).  The  digital  color  images 
from  the  ADP  system  have  been  georeferenced  using  the  DDIG  procedure,  as  explained 
in  Appendix  B,  and  resampled  by  cubic  convolution.  After  this  procedure,  two  different 
images  were  spatially  registered  for  the  fusion.  If  the  data  have  the  same  pixel  size  and 
correct  geometric  information,  it  is  possible  to  integrate  each  image  data  and  extend  to 


22 


23 


the  full-band  combination.  For  example,  we  can  have  five-band  combination  images  from 
three-band  digital  color  (red,  green,  and  blue)  images,  and  two-band  HI  (height  and 
intensity)  images.  With  five  bands,  it  is  possible  to  have  10  three-band  combinations,  five 
four-band  combinations,  and  one  five-band  combination.  In  this  research,  six  trial  band 
combinations-red,  green,  and  intensity  (RGI);  red,  green,  and  height  (RGH);  red,  height, 
and  intensity  (RHI);  red,  green,  height,  and  intensity  (RGHI);  red,  blue,  height,  and 
intensity  (RBHI);  and  red,  green,  blue,  height,  and  intensity  (RGBHI)— were  constructed 
and  tested  (see  Figure  2-2).  The  merged  image  of  near  infrared  laser  intensity  with  green 
band  and  red  band  from  color  photography  is  shown  in  Figure  2-3  (a).  The  building  areas 
can  be  differentiated  from  the  vegetation  area  by  different  colors.  In  the  near  infrared 
laser  intensity,  the  height  and  red  band  combination  are  shown  in  Figure  2-3  (b).  The 
grass  areas  are  distinguished  by  near  infrared  laser  intensity  in  this  image  and  trees  are 
also  clearly  separated  because  of  height  data. 
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Figure  2-1.  Feature-level  data  fusion  (Tuell  et  al.,  2001). 
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Figure  2-2.  Six-trial  band  combination  from  an  RGB  image  and  an  HI  image. 


(a)  RGI  combination 


(b)  RHI  combination 


Figure  2-3.  Band  combination  images:  (a)  RGI  (red,  green  and  intensity);  (b)  RHI  (red, 
height,  and  intensity)  combinations. 
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Methodology 

In  supervised  classification,  the  selection  of  training  sets  is  very  important  to 
classify  more  detail,  which  results  in  better  classification.  Using  the  commercial  image 
processing  software  ENVI,  the  regions  of  interest  (ROI)  having  the  eight  classes  that 
were  easy  to  distinguish  were  carefully  selected  as  training  data.  These  training  sets  for 
each  class  were  classified  by  manually  comparing  a color  image  and  field  surveying. 
Figure  2-4  shows  the  training  data  on  the  RGB  color  image.  These  training  data  were 
applied  to  each  trial  band  combination  to  make  each  classification  map. 

In  this  work,  we  used  the  maximum  likelihood  classification,  which  is  the  most 
powerful  classification  method  in  common  use.  Based  on  statistical  parameters,  such  as 
mean,  variance/covariance,  a probability  function  is  calculated  from  the  inputs  for  classes 
established  from  training  sites.  Each  pixel  is  then  judged  as  to  the  class  to  which  it  most 
probably  belongs.  The  detailed  description  about  the  maximum  likelihood  classification 
is  presented  in  Appendix  D.  The  maximum  likelihood  classifier  of  ENVI  software  was 
applied  to  classify  trial  band  combinations. 

The  results  from  this  study  show  a comparison  of  the  accuracy  assessments  from  a 
number  of  data  combinations 

Analysis  and  Results 

One  of  the  most  common  means  of  expressing  classification  accuracy  is  the 
preparation  of  a confusion  matrix  (sometimes  called  a contingency  matrix)  to  show  the 
accuracy  of  a classification  result  by  comparing  a classification  result  with  ground  truth 
information.  More  detail  for  accuracy  assessment  of  classification  data  is  provided  in 
Appendix  E.  We  applied  accuracy  assessment  using  the  error  matrix  to  each  band 


combination. 
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Figure  2-4.  Ground  truth  reference  data  on  an  RGB  color  image. 

When  we  use  only  the  RGB  color  band  combination  for  the  maximum  likelihood 
classification,  the  result  from  accuracy  assessment  shows  67.4768%  overall  accuracy.  In 
the  RGB  band  combination,  the  brick  class  has  the  highest  assessment  accuracy  (96.20%) 
and  the  road  class  has  the  lowest  accuracy  (54.28%).  The  producer’s  accuracy  and  the 
user’s  accuracy  were  not  acceptable.  Table  2-1  shows  the  confusion  matrix  analyses 
between  classified  data  and  reference  data  using  an  RGB  image.  Figure  2-5  (a)  shows  the 
maximum  likelihood  classification  of  an  RGB  color  image  alone.  The  classification 
image  of  RGB  shows  unclear  boundaries  between  trees  and  grass  and  also  between 
concrete  buildings  and  concrete  sidewalks,  but  it  helps  to  classify  buildings,  sidewalks, 
and  metal  with  clear  boundaries.  The  result  shows  that  an  RGB  color  image  does  not 
provide  enough  data  for  classifying  ground  objects. 


27 


In  an  HI  image,  Figure  2-5  (b)  shows  the  maximum  likelihood  classification  of  the 
RGB  color  image  alone.  The  overall  accuracy  of  classification  of  the  HI  image  was 
70.5898%.  The  road  class  had  the  highest  accuracy,  99.34%,  because  the  reflectance  of 
asphalt  is  lower  than  any  other  object  and  is  clearly  separable  in  intensity  bands,  as 
explained  in  Appendix  C.  The  grass,  tree,  and  concrete  building  classes  had  higher 
accuracy  than  those  in  the  RGB  image.  Table  2-2  shows  the  confusion  matrix  analyses 
between  classified  data  and  reference  data  using  an  HI  image.  The  classification  image  in 
Figure  2-5  (h)  shows  that  it  is  difficult  to  discern  a concrete  building  to  metal,  a car  to  a 
concrete  sidewalk,  and  a tree  to  a building  because  they  are  almost  the  same  in  height  and 
intensity  range. 


Table  2-1.  Confusion  Matrix  Between  Classified  Data  and  Reference  Data  Using  an  RGB 
Image  (Pixels) 


CLASS 

Building 

Tree 

Grass  Road 

Sidewalk  Brick 

Car 

Metal  Total 

Building 

544 

0 

0 

1 

136  0 

17 

6 704 

Tree 

1 

504 

266 

59 

0 0 

0 

0 830 

Grass 

9 

246 

393 

54 

7 0 

0 

0 709 

Road 

5 

38 

35 

165 

130  0 

0 

0 373 

Sidewalk 

59 

0 

0 

25 

525  6 

3 

7 625 

Brick 

2 

0 

0 

0 

0 152 

0 

3 157 

Car 

0 

0 

0 

0 

0 0 

121 

19  140 

Metal 

0 

0 

0 

0 

0 0 

57 

67  124 

Total 

620 

788 

694 

304 

798  158 

198 

102  3662 

Overall  Accuracy  = (2471/3662)  = 67.4768 

Kappa  Coefficient  = 0.6089 

Class  Producer’s  Accuracy 

User’s  Accuracy 

Producer’s  Accuracy  User’s  Accuracy 

(%) 

(%) 

(Pixels) 

(Pixels) 

Road 

54.28 

44.24 

165/304 

165/373 

Sidewalk 

65.79 

84.00 

525/798 

525/625 

Car 

61.11 

86.43 

121/198 

121/140 

Grass 

56.63 

55.43 

393/694 

393/709 

Tree 

63.96 

60.72 

504/788 

504/830 

Building 

87.74 

77.27 

544/620 

544/704 

Metal 

65.69 

54.03 

67/102 

67/124 

Brick 

96.20 

96.82 

152/158 

152/157 
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The  fusion  of  the  RGB  image  and  the  HI  image  can  increase  the  ability  to  classify 
ground  objects  with  their  complementary  information.  The  intensity  that  has  high 
reflectance  for  vegetation  can  help  to  separate  a vegetation  area  from  a non-vegetation 
area,  and  DEM  gives  useful  data  to  separate  grass  from  tree,  buildings  from  sidewalks. 
The  RGB  bands  can  help  to  classify  buildings,  sidewalks,  bricks,  and  metal  with  a clear 
boundary. 

First,  we  added  intensity  data  to  red  and  green  bands.  In  the  RGI  band  combination, 
the  overall  accuracy  is  76.9525%.  Table  2-3  shows  the  confusion  matrix  analyses 
between  classified  data  and  reference  data  using  the  RGI  combination.  When  the  RGB 


Table  2-2.  Confusion  Matrix  Between  Classified  Data  and  Reference  Data  Using  an  HI 
Image  (Pixels) 


CLASS 

Building 

Tree 

Grass  Road 

Sidewalk  Brick 

Car 

Metal  Total 

Building 

452 

138 

0 

0 

0 0 

0 

36  626 

Tree 

59 

638 

0 

0 

0 0 

0 

19  716 

Grass 

3 

0 

597 

0 

90  0 

117 

0 807 

Road 

1 

0 

6 302 

37  7 

11 

0 364 

Sidewalk 

10 

0 

8 

0 

413  17 

36 

0 484 

Brick 

0 

0 

0 

0 

101  102 

0 

0 203 

Car 

39 

0 

83 

2 

157  32 

34 

0 347 

Metal 

56 

12 

0 

0 

0 0 

0 

47  115 

Total 

620 

788 

694  304 

798  158 

198 

102  3662 

Overall  Accuracy  = (2471/3662)  = 70.5898  % 

Kappa  Coefficient  = 

0.6506 

Class  Producer’s  Accuracy 

User’s  Accuracy 

Producer’s  Accuracy  User’s  Accuracy 

(%) 

(%) 

(Pixels) 

(Pixels) 

Road 

99.34 

82.97 

302/304 

302/364 

Sidewalk 

51.75 

85.33 

413/798 

413/484 

Car 

17.17 

9.80 

34/198 

34/347 

Grass 

86.02 

73.98 

597/694 

597/807 

Tree 

80.96 

89.11 

638/788 

638/716 

Building 

72.90 

72.20 

452/620 

452/626 

Metal 

46.08 

40.87 

47/102 

47/115 

Brick 

64.56 

50.25 

102/158 

102/203 
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(a)  RGB  combination  (b)  HI  combination 


Building 
Asphalt  Road 


Concrete  Sidewalk 
Median  Brick 


Tree 

Metal  on  the  roof 


Grass 

Car 


Figure  2-5.  Maximum  likelihood  classification;  (a)  RGB  (red,  green,  and  blue);  (b)  HI 
(height  and  intensity)  band  combinations. 


and  RGI  combination  were  compared,  it  showed  how  intensity  data  were  applied  to 
classify  ground  objects.  The  RGI  combination  has  higher  accuracy  for  every  class  than 
the  RGB  image  alone.  Figure  2-6  (a)  shows  the  maximum  likelihood  classification  of  the 
RGI  bands  combination.  In  the  RGI  band  combination,  parts  of  the  building  were 
classified  as  sidewalk  class,  but  if  the  height  data  can  be  provided,  the  building  class  can 
be  clearly  discerned.  Compared  to  the  classification  of  RGB  only,  the  classification 
image  of  RGH  having  height  data  shows  clearer  boundaries  between  tree  and  grass.  The 
concrete  sidewalk  class  was  clearly  separated  from  concrete  buildings.  However,  some 
grass  area  was  classified  as  asphalt  road. 

In  the  RGH  band  combination,  the  overall  accuracy  is  83. 12%,  which  is  higher  than 
the  overall  accuracy  of  the  RGI  band  combination.  With  height  data,  road,  sidewalk,  tree 
and  building  classes  were  classified  with  higher  accuracy  than  in  the  RGB  band 
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Table  2-3.  Confusion  Matrix  Between  Classified  Data  and  Reference  Data  Using  an  RGI 

Combination  (Pixels) 

CLASS 

Building 

Tree 

Grass  Road 

Sidewalk  Brick 

Car 

Metal  Total 

Building 

449 

3 

0 

5 

129  0 

9 

5 

600 

Tree 

1 

602 

54 

90 

0 0 

0 

0 

747 

Grass 

5 

19 

640 

0 

37  1 

1 

0 

703 

Road 

3 

100 

0 209 

23  5 

0 

0 

340 

Sidewalk 

117 

64 

0 

0 

581  3 

1 

7 

773 

Brick 

0 

0 

0 

0 

1 149 

0 

3 

153 

Car 

0 

0 

0 

0 

0 0 

121 

20 

141 

Metal 

45 

0 

0 

0 

27  0 

66 

67 

205 

Total 

620 

788 

694  304 

798  158 

198 

102 

3662 

Overall  Accuracy  = (2818/3662)  76.9525  % 

Kappa  Coefficient  = 

= 0.7233 

Class  Producer’s  Accuracy 

User’s  Accuracy 

Producer’s  Accuracy  User’s  Accuracy 

(%) 

(%) 

(Pixels) 

(Pixels) 

Road 

68.75 

61.47 

209/304 

209/340 

Sidewalk 

72.81 

75.16 

581/798 

581/773 

Car 

61.11 

85.82 

121/198 

121/141 

Grass 

92.22 

91.04 

640/694 

640/703 

Tree 

76.40 

80.59 

602/788 

602/747 

Building 

72.42 

74.83 

449/620 

449/600 

Metal 

65.69 

32.68 

67/102 

67/205 

Brick 

94.30 

97.39 

149/158 

149/153 

combination,  and  especially  tree  and  building  classes  have  acceptable  user’s  accuracies. 
The  producer’s  accuracy  of  car,  grass,  metal  and  brick  classes  is  decreased,  while  the 
accuracy  of  the  user’s  accuracies  for  the  classes  is  increased  and  acceptable.  Table  2-4 
shows  the  confusion  matrix  analyses  between  classified  data  and  reference  data  using  the 
RGH  combination.  Figure  2-6  (b)  shows  the  maximum  likelihood  classification  of  the 
RGH  bands  combination. 

In  the  RHI  band  combination,  the  red  band  was  added  to  the  HI  image.  The  red  color 
represents  one  of  the  most  important  bands  for  vegetation  discrimination.  This  is  the  red 
chlorophyll  absorption  band  of  healthy  green  vegetation.  This  band  may  exhibit  more 
contrast  than  the  blue  and  green  bands  because  of  the  reduced  effect  of  atmospheric 
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Table  2-4.  Confusion  Matrix  Between  Classified  Data  and  Reference  Data  Using  an  RGH 

Combination  (Pixels) 

CLASS 

Building 

Tree 

Grass  Road 

Sidewalk  Brick  Car  Metal  Total 

Building 

549 

19 

0 0 

0 0 

0 54 

622 

Tree 

0 

769 

0 0 

0 0 

0 0 

769 

Grass 

23 

0 

555  62 

6 8 

0 3 

657 

Road 

6 

0 

138  214 

106  4 

0 0 

464 

Sidewalk 

42 

0 

1 28 

686  0 

116  2 

879 

Brick 

0 

0 

0 0 

0 146 

0 0 

146 

Car 

0 

0 

0 0 

0 0 

82  0 

82 

Metal 

0 

0 

0 0 

0 0 

0 43 

43 

Total 

620 

788 

694  304 

798  158 

198  102 

3662 

Overall  Accuracy  = 

= (3044/3662)  83.1240%  Kappa  coefficient  = 

= 0.7951 

Class  Producer’s  Accuracy 

User’s  Accuracy 

Producer’s  Accuracy 

User’s  Accuracy 

(%) 

(%) 

(Pixels) 

(Pixels) 

Road 

70.39 

46.12 

214/304 

214/464 

Sidewalk 

85.96 

78.04 

686/798 

686/879 

Car 

41.41 

100.00 

82/198 

82/82 

Grass 

79.97 

84.47 

555/694 

555/657 

Tree 

97.59 

100.00 

769/788 

769/769 

Building 

88.55 

88.26 

549/620 

549/622 

Metal 

42.16 

100.00 

43/102 

43/43 

Brick 

92.41 

100.00 

146/158 

146/146 

attenuation  (Jensen,  1996).  The  discussion  of  the  characteristics  of  intensity  in  Appendix 
C shows  that  grass  and  concrete  were  difficult  to  discern  using  only  ALSM  height  and 
intensity.  In  the  RHI  band  combination,  the  overall  accuracy  of  classification  was 
increased  to  89.43%.  By  the  characteristic  of  the  red  band  and  the  near-infi'ared  intensity 
band,  the  vegetation  was  separated  from  other  objects,  and  then  the  grass  and  tree  classes 
are  separated  with  height  data.  The  road  class  was  classified  with  high  accuracy  because 
of  its  significantly  low  refiectance  value  in  intensity  at  the  laser  wavelength  (1064«w). 
The  accuracy  of  the  building  class  was  also  increased  with  the  height  data.  However,  with 
the  loss  of  the  green  band,  the  accuracy  of  the  brick  class  was  significantly  decreased  in 
both  the  producer’s  accuracy  and  the  user’s  accuracy.  We  need  to  check  whether  adding 
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(a)  RGI  combination 


(b)  RGH  combination 


Building  □□  Concrete  Sidewalk  Hi  Tree  ■§  Grass 

Asphalt  Road  !■  Median  Brick  Meta i on  the  roof  !■  Car 


Figure  2-6.  Maximum  likelihood  classification:  (a)  RGI  (red,  green,  and  intensity);  (b) 
RGH  (red,  green,  and  height)  band  combinations. 


the  green  band  to  the  RHI  bands  may  increase  the  accuracy  of  the  brick  class.  The 
accuracy  of  the  sidewalk  in  the  RHI  band  combination  was  lower  than  that  in  the  RGH, 
and  the  accuracy  of  metal  in  the  RHI  also  was  lower  than  in  the  RGI  band  combination. 
Table  2-5  presents  the  confusion  matrix  analysis  between  classified  data  and  reference 
data  using  the  RHI  combination  that  resulted  when  using  the  red  band  of  the  digital  color 
image  and  the  ALSM  intensity  and  DEM.  Figure  2-7  (a)  shows  the  maximum  likelihood 
classification  of  the  RHI  bands  combination.  The  classification  image  shows  that  some 
concrete  sidewalk  was  classified  as  brick  because  they  are  in  same  height  and  spectral 
ranges. 

By  adding  green  to  RHI,  the  overall  accuracy  of  the  RGHI  band  combination  was 
increased  to  91.18%.  The  accuracy  of  sidewalk  and  brick  was  increased  but  the  accuracy 
of  the  others  was  decreased  slightly.  The  significant  increase  of  the  brick  class  increases 
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Table  2-5.  Confusion  Matrix  Between  Classified  Data  and  Reference  Data  Using  an  RHI 
Combination  (Pixels) 

CLASS 

Building 

Tree 

Grass  Road 

Sidewalk  Brick 

Car 

Metal  Total 

Building 

563 

16 

0 0 

0 0 

0 

54  633 

Tree 

16 

772 

0 0 

0 0 

0 

0 788 

Grass 

18 

0 

684  0 

42  6 

2 

0 752 

Road 

1 

0 

6 299 

37  18 

0 

0 361 

Sidewalk 

10 

0 

4 1 

672  31 

62 

0 780 

Brick 

0 

0 

0 4 

47  103 

0 

0 154 

Car 

12 

0 

0 0 

0 0 

134 

0 146 

Metal 

0 

0 

0 0 

0 0 

0 

48  48 

Total 

620 

788 

694  304 

798  158 

198 

102  3662 

Overall  Accuracy  = (3275/3662)  89.4320% 

Kappa  Coefficient 

= 0.8721 

Class  Producer’s  Accuracy 

User’s  Accuracy 

Producer’s  Accuracy  User’s  Accuracy 

(%) 

(%) 

(Pixels) 

(Pixels) 

Road 

98.36 

82.83 

299/304 

299/361 

Sidewalk 

84.21 

86.15 

672/798 

672/780 

Car 

67.68 

91.78 

134/198 

134/146 

Grass 

98.56 

90.96 

684/694 

684/752 

Tree 

97.97 

97.97 

772/788 

772/788 

Building 

90.81 

88.94 

563/620 

563/633 

Metal 

47.06 

100.00 

48/102 

48/48 

Brick 

65.19 

66.88 

103/158 

103/154 

the  overall  accuracy.  The  accuracies  of  most  classes  were  highly  acceptable,  but  the 
accuracies  of  car  and  metal  classes  were  low  and  not  acceptable.  We  expected  more  data 
would  increase  the  accuracy  for  every  class.  However,  in  this  work,  adding  one  more  set 
of  data  helped  to  increase  the  accuracy  of  the  brick  class  and  make  it  acceptable,  but  it 
also  caused  a decrease  of  other  classes.  We  will  talk  about  this  problem  in  the  discussion. 
Table  2-6  presents  the  confusion  matrix  analysis  between  classified  data  and  reference 
data  using  the  RGHI  combination.  Figure  2-7  (b)  shows  the  maximum  likelihood 
classification  of  the  RGHI  bands  combination.  Instead  of  a green  band,  the  blue  band  was 
added  to  the  RHI  band  combination.  The  overall  accuracy  was  91.3162%.  Compared  to 
the  RGHI  band  combination,  the  accuracies  of  the  road,  sidewalk,  tree,  and  metal  classes 
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Table  2-6.  Confusion  Matrix  Between  Classified  Data  and  Reference  Data  Using  an 
RGHI  Combination  (Pixels) 


CLASS 

Building 

Tree 

Grass  Road 

Sidewalk  Brick  Car 

Metal  Total 

Building 

554 

16 

0 

1 

0 0 

0 

54 

625 

Tree 

28 

772 

0 

1 

0 0 

0 

1 

830 

Grass 

17 

0 

684 

0 

37  6 

1 

0 

745 

Road 

1 

0 

6 

298 

42  0 

0 

3 

347 

Sidewalk 

19 

0 

4 

4 

719  7 

74 

0 

830 

Brick 

0 

0 

0 

0 

0 145 

0 

0 

145 

Car 

0 

0 

0 

0 

0 0 

123 

0 

123 

Metal 

1 

0 

0 

0 

0 0 

0 

44 

45 

Total 

620 

788 

694 

304 

798  158 

198 

102 

3662 

Overall  Accuracy  = (3339/3662)  91.1797% 

Kappa  Coefficient  = 

0.8930 

Class  Producer’s  Accuracy 

User’s  Accuracy 

Producer’s  Accuracy 

User’s  Accuracy 

(%) 

(%) 

(Pixels) 

(Pixels) 

Road 

98.03 

85.88 

298/304 

298/347 

Sidewalk 

90.10 

86.63 

719/798 

719/830 

Car 

62.12 

100.00 

123/198 

123/123 

Grass 

98.56 

91.81 

684/694 

684/745 

Tree 

97.97 

96.26 

772/788 

772/802 

Building 

89.35 

88.64 

554/620 

554/625 

Metal 

43.14 

97.78 

44/102 

44/45 

Brick 

91.77 

100.00 

145/158 

145/145 

were  increased  and  that  of  the  car,  building  and  brick  classes  were  decreased.  The 
accuracy  of  the  grass  class  was  the  same.  The  car  and  metal  still  had  unacceptable  low 
accuracies.  Table  2-7  presents  the  confusion  matrix  analysis  between  classified  data  and 
reference  data  using  the  RBHI  combination,  which  resulted  when  using  red  and  blue  of 
the  digital  color  image  and  the  ALSM  intensity  and  DEM.  Figure  2-8  (a)  shows  the 
maximum  likelihood  classification  of  RBHI  bands  combination. 

The  RGBHI  band  combination  using  all  different  data  types  was  integrated  and 
classified  by  the  maximum  likelihood  classification.  Table  2-8  presents  the  confusion 
matrix  analysis  between  classified  data  and  reference  data  using  the  RGBHI  combination, 
which  resulted  when  using  red,  green  and  blue  of  the  digital  color  image  and  the  ALSM 
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(a) 


RHI  combination 

Building  CZH  Concrete  Sidewalk 

Asphalt  Road  Median  Brick 


(b)  RGHI  combination 
H Tree  ■■  Grass 

■ Metal  on  the  roof  IH  car 


Figure  2-7.  Maximum  likelihood  classification:  (a)  RHI  (red,  height  and  intensity;  (b) 
RGHI  (red,  green,  height,  and  intensity)  band  combinations. 


Table  2-7.  Confusion  Matrix  Between  Classified  Data  and  Reference  Data  Using  an 
RBHI  Combination  (Pixels) 


CLASS 

Building 

Tree 

Grass 

Road 

Sidewalk 

Brick 

Car 

Metal 

Total 

Building 

551 

14 

0 

0 

0 

0 

0 

53 

618 

Tree 

7 

774 

0 

0 

0 

0 

0 

0 

781 

Grass 

15 

0 

684 

0 

24 

6 

1 

0 

730 

Road 

1 

0 

6 

304 

42 

3 

0 

0 

356 

Sidewalk 

33 

0 

4 

0 

732 

15 

80 

1 

854 

Brick 

0 

0 

0 

0 

0 

134 

0 

0 

134 

Car 

13 

0 

0 

0 

0 

0 

117 

0 

130 

Metal 

11 

0 

0 

0 

0 

0 

0 

48 

59 

Total 

620 

788 

694 

304 

798 

158 

198 

102 

3662 

Overall  Accuracy  = (3344/3662)  91 .3 162%  Kappa  Coefficient  = 0.8947 


Class  Producer’s  Accuracy  User’s  Accuracy  Producer’s  Accuracy  User’s  Accuracy 


(%) 

(%) 

(Pixels) 

(Pixels) 

Road 

100.00 

85.39 

304/304 

304/356 

Sidewalk 

91.73 

85.71 

732/798 

732/854 

Car 

59.09 

90.00 

117/198 

117/130 

Grass 

98.56 

93.70 

684/694 

684/730 

Tree 

98.22 

99.10 

774/788 

774/781 

Building 

88.87 

89.16 

551/620 

551/618 

Metal 

47.06 

81.36 

48/102 

48/59 

Brick 

84.81 

100.00 

134/158 

134/134 
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Table  2-8.  Confusion  Matrix  Between  Classified  Data  and  Reference  Data  Using  an 
RGBHI  Combination  (Pixels)  


CLASS 

Building 

Tree 

Grass  Road 

Sidewalk  Brick  ' 

Car  Metal 

Total 

Building 

552 

11 

0 

3 

0 0 

0 54 

620 

Tree 

33 

111 

0 

0 

0 0 

0 1 

800 

Grass 

14 

0 

684 

0 

26  6 

1 0 

731 

Road 

1 

0 

6 

299 

41  3 

0 0 

350 

Sidewalk 

23 

0 

4 

2 

731  4 

97  2 

863 

Brick 

0 

0 

0 

0 

0 145 

0 0 

145 

Car 

1 

0 

0 

0 

0 0 

100  0 

101 

Metal 

7 

0 

0 

0 

0 0 

0 45 

52 

Total 

620 

788 

694 

304 

798  158 

198  102 

3662 

Overall  Accuracy  = (3333/3662)  91.0158% 

Kappa  Coefficient 

= 0.8909 

Class  Producer’s  Accuracy 

User’s  Accuracy 

Producer’s  Accuracy 

User’s  Accuracy 

(%) 

(%) 

(Pixels) 

(Pixels) 

Road 

98.36 

85.43 

299/304 

299/350 

Sidewalk 

91.60 

84.70 

731/798 

731/863 

Car 

50.51 

99.01 

100/198 

100/101 

Grass 

98.56 

93.57 

684/694 

684/731 

Tree 

98.60 

97.13 

777/788 

777/800 

Building 

89.03 

89.03 

552/620 

552/620 

Metal 

44.12 

86.54 

45/102 

45/52 

Brick 

91.77 

100.00 

145/158 

145/145 

intensity  and  DEM.  Figure  2-8  (b)  shows  the  maximum  likelihood  classification  of  the 
RGBHI  band  combination.  In  the  RGBHI  band  combination,  the  overall  accuracy  was 
91.0158%.  The  overall  accuracy  of  the  RGBHI  band  combination  is  lower  than  the 
overall  accuracy  of  RGHI  and  RBHI.  The  car  and  metal  classes  still  had  unacceptable 
low  accuracies,  and  only  the  tree  class  had  higher  accuracy  than  both  the  RGHI  and 
RBHI  combinations.  Before  processing  this  band  combination,  the  fusion  of  all 
information,  such  as  color  image,  the  ALSM  intensity  and  height  information,  was 
expected  to  have  higher  accuracies  for  buildings,  trees,  and  other  objects  on  the  ground, 
but  the  result  shows  us  that  adding  some  bands  may  actually  decrease  the  accuracy  of 
classifying  some  ground  objects,  such  as  a building. 


37 


(a)  RBHI  combination  (b)  RGBHI  combination 


Building 
Asphalt  Road 


□□  Concrete  Sidewalk 
HI  Median  Brick 


Tree 

Metal  on  the  roof 


Figure  2-8.  Maximum  likelihood  classification:  (a)  RBHI  (red,  blue,  height  and 
intensity);  (b)  RGBHI  (red,  green,  blue,  height,  and  intensity)  band  combinations. 


Discussion 

Among  several  trial  band  combinations  of  the  RGB  color  image  and  HI  image,  the 
RBHI  band  combination  in  the  maximum  likelihood  classification  has  the  highest  overall 
accuracy.  We  compared  the  overall  accuracies  of  each  band  combination  in  Table  2-9. 
After  adding  more  bands,  the  accuracies  of  some  classes  were  decreased,  and  it  caused  a 
decrease  to  the  overall  accuracy  of  classification  in  some  band  combinations,  such  as 
RGBHI.  The  simple  assumption  of  data  fusion  is  that  more  data  would  provide  better 
accuracy  of  identification.  However,  the  result  fi-om  the  RGBHI  band  combination  shows 
that  the  assumption  may  not  always  prove  correct. 

The  basic  problem  in  a remote  sensing  pattern  recognition  classification,  given  a 
spectral  distribution  of  data  in  n bands  (here,  five  bands),  is  to  find  an  n-  dimensional 
decision  boundary  that  will  allow  the  separation  of  the  major  classes  (eight  in  this  work) 
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Table  2-9.  Overall  Accuracy  of  Eight  Band  Combinations  (%) 


CLASS 

Building 

Tree 

Grass 

Road 

Sidewalk 

Brick 

Car 

Metal 

Overall 

RGB 

87.74 

63.96 

56.63 

54.28 

65.79 

96.20 

61.11 

65.69 

67.48 

HI 

72.90 

80.96 

86.02 

99.34 

51.75 

64.56 

17.17 

46.08 

70.59 

RGI 

72.42 

76.40 

92.22 

68.75 

72.81 

94.30 

61.11 

65.69 

79.95 

RGH 

88.55 

97.56 

79.97 

70.39 

85.96 

92.41 

41.41 

42.16 

83.12 

RHI 

90.81 

97.97 

98.56 

98.36 

84.21 

65.19 

67.38 

47.06 

89.43 

RGHI 

89.35 

97.97 

98.56 

98.03 

90.10 

91.77 

62.13 

43.14 

91.18 

RBHI 

88.87 

98.22 

98.56 

100.0 

97.73 

84.81 

59.09 

47.06 

91.32 

RGBHI 

89.03 

98.60 

98.56 

98.36 

91.60 

91.77 

50.51 

44.12 

91.02 

(Jensen,  1996).  This  problem  is  demonstrated  diagrammatically  using  a red  band  and 
eight  classes  in  Figure  2-9.  Table  2-10  shows  the  mean  and  standard  deviation  of  each 
class  in  the  maximum  likelihood  classification  for  the  RGBHI  band  combination.  When 
we  look  at  the  normal  distribution  graph  of  eight  classes  on  a green  band  in  Figure  2-10, 
the  grass,  brick,  and  road  classes  are  overlapped  closely,  and  they  cannot  be  easily 
separated  in  the  green  band  alone.  Sidewalk  and  building  classes,  as  well  as  metal  and  car 
classes,  have  the  same  problem.  The  normal  distribution  of  a blue  band  in  Figure  2-11 
shows 


Table  2-10.  Statistics  Summary  of  the  Maximum  Likelihood  Classification  for  the 
RGBHI  Band  Combination 


Red 

Green 

Blue 

Height 

Intensity 

Class 

Mean 

Stdv. 

Mean 

Stdv. 

Mean 

Stdv. 

Mean 

Stdv. 

Mean 

Stdv. 

Building 

181.12 

23.82 

143.02 

20.95 

169.28 

20.80 

30.67 

2.04 

135.49 

17.82 

Tree 

84.73 

13.22 

66.80 

8.68 

84.02 

7.71 

34.22 

3.15 

116.16 

47.69 

Grass 

116.06 

10.17 

89.07 

6.28 

99.42 

6.84 

20.47 

1.71 

264.86 

25.65 

Road 

122.56 

13.45 

93.41 

9.66 

114.69 

10.92 

21.24 

1.04 

75.95 

15.92 

Sidewalk 

153.92 

19.78 

117.43 

16.68 

137.72 

16.66 

19.99 

1.27 

155.38 

14.53 

Brick 

148.19 

4.22 

92.98 

3.30 

106.24 

6.34 

22.28 

0.03 

175.88 

32.05 

Car 

253.69 

2.77 

212.95 

3.23 

232.80 

8.57 

21.86 

0.57 

234.11 

94.70 

Metal 

242.34 

9.16 

207.10 

5.95 

219.25 

11.73 

26.09 

0.23 

168.20 

91.76 
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that  the  eight  classes  in  a blue  band  have  the  same  situation  as  the  green  band.  Figure  2- 
12  that  shows  the  normal  distribution  diagram  of  intensity,  the  asphalt  road  and  grass 
classes  may  be  separated  more  clearly,  compared  to  other  classes,  and  the  metal  and  car 
classes  are  overlapped  with  almost  all  other  land-cover 


Figure  2-9.  Diagram  of  normal  distribution  of  eight  classes  on  a red  band. 


Figure  2-10.  Diagram  of  normal  distribution  of  eight  classes  on  a green  band. 
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classes  because  of  their  large  standard  deviation.  Figure  2-13  shows  the  normal 
distribution  diagram  of  height  data.  Since  the  brick  class  on  the  height  data  has  a small 
standard  deviation,  the  density  of  the  normal  distribution  diagram  of  brick  is  high.  The 
normal  distributions  of  other  classes  on  the  height  data  are  shown  in  Figure  2-14. 
Generally,  the  more  bands  we  analyze  in  a classification,  the  greater  the  amount  of 
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redundant  spectral  information  being  used.  The  classes  being  overlapped  closely  in 
spectral  distribution  are  not  easy  to  be  clearly  separated  from  each  other. 

When  comparing  the  RGBHl  band  combination  to  the  RBHl  band  combination,  the 
accuracies  of  road  and  sidewalk  classes  were  decreased  because  these  two  classes  are 
close  in  the  green  band,  and  they  have  much  common  area  in  their  spectral  distribution. 
Because  of  this  problem,  it  may  not  be  a best  solution  for  the  classification  to  just 
combine  and  extend  the  number  of  bands  that  are  spectrally  close. 


Figure  2-13.  Diagram  of  normal  distribution  of  eight  classes  on  a height  band. 
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Figure  2-14.  Diagram  of  normal  distribution  of  seven  classes  on  a height  band. 

In  Chapters  3 and  4,  we  will  investigate  the  new  approaches  to  fuse  two  different  data. 
In  two  approaches,  the  expert  system  and  the  Dempster-Shafer  evidential  theory,  we  will 
classify  each  image  respectively,  and  then  we  will  apply  data  fusion  algorithms. 

Summary 

By  combining  the  relatively  high  resolution  digital  color  images  from  the  ADP  system 
and  near  infrared  (NIR)  intensity  images  and  DEM  produced  from  the  ALSM  system, 
images  that  have  both  excellent  spatial  resolution  and  five  bands  extending  into  red, 
green,  blue,  NIR  and  DEM  can  be  developed. 

For  the  classification  of  ground  objects,  a supervised  pattern  recognition  technique 
(maximum  likelihood  classification)  was  applied  to  classify  eight  classes,  such  as 
building,  tree,  grass,  road,  sidewalk,  brick,  car,  and  metal,  on  an  RGB  image  and  an  HI 
image,  respectively.  The  accuracy  assessments  of  the  two  images  revealed  that  each  has 
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useful  data  to  classify  some  objects,  such  as  a concrete  building  and  brick  on  the  RGB 
image  and  grass  and  asphalt  road  on  the  HI  image. 

To  investigate  the  potential  of  combining  the  two  types  of  images,  we  combined  five 
available  bands  and  made  six  band  combination  images:  RGI,  RGH,  RHI,  RGHI,  RBHI, 
and  RGBHI.  After  comparing  accuracy  assessments  of  those  six  band  combination 
images,  the  RBHI  image  had  the  highest  accuracy  of  91 .32%,  but  no  band  combination 
image  resulted  in  classifications  of  the  car  and  metal  classes  with  acceptable  accuracies. 

However,  the  RGBHI  band  combination  image  had  a lower  accuracy  than  RBHI  in 
this  research.  Adding  a green  band  to  an  RBHI  image  causes  a decrease  of  overall 
accuracy  on  an  RGBHI  image.  This  problem  resulted  from  large  areas  of  overlap  in  the 
spectral  distributions  of  some  classes  in  the  green  band.  These  overlapped  classes  had 
lower  classification  accuracies  that  caused  a decrease  of  the  overall  accuracy  of  the 
RGBHI  band  combination.  This  experiment  shows  that  a simple  combination  of  two  sets 
of  data  and  extending  the  number  of  bands  may  not  be  the  best  solution  to  increase  the 
recognition  of  objects. 


CHAPTER  3 

CLASSIFICATION  USING  AN  EXPERT  SYSTEM  APPROACH 
Image  interpretation  produces  a high-level  description  of  a three-dimensional 
environment  from  which  the  image  was  taken.  Through  interpretation,  we  try  to 
understand  the  image  by  identifying  important  features  or  objects  and  analyze  them  in  the 
context  of  the  scene  (Kopparapu  & Desai,  2001).  In  this  way,  a human  analyst  moves 
easily  from  data  to  information  in  high-level  decision  processing.  While  working  on  an 
image,  they  may  take  into  consideration  other  data,  such  as  that  from  available  thematic 
maps,  personal  field  experience,  and  common  sense.  The  objective  of  an  image 
interpretation  system  is  to  make  the  computer  do  the  same  task.  Many  researchers  have 
sought  to  develop  automated  systems  which  replicate  the  human  process.  One  obvious 
approach  to  this  effort  is  to  generate  a number  of  decision  rules  which  mimic  the  human 
logical  processes.  A system  of  rules  designed  for  a specific  purpose  is  often  called  the 
expert  system  (Winston,  1999).  In  this  chapter,  we  explore  the  use  of  an  expert  system  for 
sensor  fusion. 

Rule-based  Classification 

In  traditional  methods  of  land  cover  classification,  the  primary  determinants  of 
classification  detail  can  be  achieved  by  the  spectral  and  spatial  resolution  of  the  imagery. 
If  additional  data  from  different  sensors  can  be  fused  with  a given  combination  of 
spectral  and  spatial  resolutions,  it  might  be  possible  to  achieve  either  greater 
classification  detail  or  greater  classification  accuracy  (Lawrence  & Wright,  2001)  or  to 
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produce  a higher  level  of  data  abstraction.  This  goal  is  important  because  it  addresses  the 
central  task  of  moving  from  data  to  information. 

In  many  cases,  the  use  of  ancillary  data  in  addition  to  spectral  data  has  demonstrated 
that  the  proper  addition  of  ancillary  data  to  spectral  data  can  lead  to  greater  class 
distinctions  (for  example,  Strahler  et  al.,  1978;  Hutchinson,  1982;  Trotter,  1991;  Jensen, 
1996).  Combining  several  different  data  types  for  image  classification  can  increase  its 
accuracy  and  precision. 

We  investigated  the  supervised  statistic  pattern  recognition  technique  in  Chapter  1.  In 
this  Chapter,  we  will  investigate  one  decision-level  data  fusion  method  to  combine  two 
data  sets.  In  Figure  3-1,  decision-level  fusion  is  shown  (Tuell  et  al.,  2001).  There  are 
several  decision-level  fusion  methods  in  which  an  expert  system  can  be  implemented  for 
sensor  fusion  (Sell,  1985;  Frost,  1986).  The  simplest  way,  and  perhaps  the  most  common, 
is  through  the  generation  of  a set  of  production  rules.  These  rules  are  implemented  as  a 
succession  of  IF/THEN  statements. 

DATA  FEATURE  DECISION  INFORMATION 


Dedsltm 

Levd 

Fusion 


Figure  3-1.  Decision-level  data  fusion  (Tuell  et  al.,  2001). 
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An  example  of  a production  rule  for  our  sensor  fusion  research  is: 

IF:  the  pixel  value  of  the  camera’s  red  band  is  less  than 
threshold  1,  and  the  corresponding  ALSM  intensity 
value  is  larger  than  threshold  2,  and  the  height  of  the 
objects  is  higher  than  threshold  3, 

THEN:  there  is  evidence  that  the  identity  of  the  object  is 
Type  1. 

Figure  3-2  shows  the  processing  diagram  that  is  applied  for  production  rule-based 
classification  for  fusion  of  data  from  different  sensors  in  this  research.  In  this  chapter,  a 
rule-based  classification  system  using  digital  RGB  color  images  from  the  ADP  system, 
and  the  HI  image  from  the  ALSM  system  will  be  performed.  And  then  each  classification 
will  be  combined  to  create  new  classes. 


Figure  3-2.  The  diagram  of  rule-based  data  fusion  for  two  sensors. 
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Analysis  of  Study  Area 

The  study  area  has  buildings,  concrete  sidewalks,  asphalt  roads,  grasses  and  trees,  car, 
brick  median  on  the  road,  and  metal  pipe/table  on  the  roof.  The  elevation  changes  slightly 
along  the  asphalt  roads,  but  most  areas  have  almost  the  same  elevation.  Figure  3-3  shows 
the  north-south  profile  (A)  and  east-west  profile  (B)  on  DEM  of  study  area. 

Along  profile  A,  the  north  end  is  a few  meters  higher  than  the  south  and  the  ellipsoid 
height  of  the  asphalt  road  at  north  is  approximately  22.5m.  Figure  3-4  shows  the 
elevation  change  of  the  north-south  profile.  Along  profile  B,  the  elevation  at  the  east  end 
is  getting  higher  than  the  west  area  and  there  are  tall  buildings  and  trees  on  the  grass  (see 
Figure  3-5).  Based  on  the  elevation  changes  and  pixel  value  of  digital  color  images,  the 
rule  for  classification  of  this  area  was  developed  spectrally  and  spatially. 


Figure  3-3.  DEM  of  test  area  from  ALSM  (A  and  B are  the  elevation  change  profiles). 
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Figure  3-4.  Elevation  from  north  (left)  to  south  (right)  at  study  area  (see  Profile  “A”  in 
Figure  3-3). 


Figure  3-5.  Elevation  from  west  (left)  to  east  (right)  of  study  area  (see  Profile  “B”  in 
Figure  3-3). 
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In  the  study,  the  following  data  are  used: 

• DEM  (Digital  Elevation  Model)  data 

• Intensity  imagery  (proportional  to  received  power  at  the  laser  wavelength, 
X=  1064  nm) 

• Digital  color  imagery  from  the  ADP  system  (consisting  of  red,  green,  and 
blue  bands) 

These  three  sets  of  data  from  two  different  sensors  are  fused  to  make  rule-based 
classification  in  this  study.  The  DEM  from  the  ALSM  system  was  already  shown  in 
Figure  3-3.  The  digital  color  image  and  intensity  image  are  shown  in  Figure  3-6. 

Methods 

Two  digital  color  images  were  chosen  for  the  study  area  and  were  the  most  shadow- 
free  scenes  available.  The  data  from  the  ALSM  system  were  collected  simultaneously 


Figure  3-6.  Image  data  of  study  area:  (a)  digital  color  image  from  the  ADP  system;  (b) 
intensity  near  infrared  image  from  the  ALSM  system. 


50 


with  the  digital  color  images  and  processed  using  REALM  (Topscan).  In  the  study  area, 
five  dominant  land  cover  classes  can  be  distinguished  easily:  vegetation  (grass  and  trees), 
concrete  sidewalk,  asphalt  road,  metal,  and  bricks.  For  each  cover  type,  several  sample 
sites  were  identified  in  both  the  RGB  image  and  the  ALSM  intensity  image.  The  spectral 
ranges  of  an  RGB  image  are  shown  in  Figure  3-7,  and  the  range  of  the  ALSM  intensities 
for  each  land  cover  type  is  shown  in  Figure  3-8.  Figure  3-7  presents  the  radiance 
distribution  of  red,  green,  and  blue  color  for  five  land-cover  classes.  Metal  class  has 
higher  pixel  value  than  other  classes.  Vegetation,  concrete,  asphalt,  and  brick  are  in 
almost  the  same  range  of  pixel  value  distribution  so  that  they  cannot  clearly  distinguish 
each  other  using  RGB  images  alone. 

The  near  infrared  intensity  image  from  ALSM  was  helpful  to  classify  vegetation, 
asphalt,  and  metal  classes.  In  the  intensity  distribution  for  each  land-cover  class  in  Figure 
3-8,  most  vegetation,  concrete,  and  brick  areas  were  above  intensity  value  100,  and  the 
asphalt  area  was  below  intensity  value  100.  Concrete  and  brick  were  mixed  in  intensity 
value  from  100  to  200,  however,  metal  can  be  separated  from  the  other  materials  because 
it  has  a higher  intensity  value  than  the  intensity  value  320. 

We  used  these  data  distributions  from  the  sample  sites  to  generate  production  rules  for 
the  classification  of  three-band  (red,  green,  and  blue)  RGB  images  and  the  rule  for  the 
classification  of  two-band  HI  (height  and  intensity)  images.  These  rules  were 
programmed  in  the  IDL  language.  We  call  this  classification  rule  for  each  image  the 
level- 1 classification  rule.  Using  the  level- 1 classification  rule  for  an  HI  image  from  the 
ALSM  system,  five  classes,  such  as  tall  objects,  asphalt,  concrete,  vegetation,  and  metal 
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Figure  3-7.  The  radiance  distribution  of  the  red,  green,  and  blue  bands  for  the  five  land- 
cover  classes. 
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Figure  3-8.  ALSM  intensity  distribution  for  each  land-cover  class. 
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were  created.  The  level- 1 classification  rules  for  the  RGB  images  from  the  ADP  system 
were  also  developed  to  produce  six  classes,  such  as  vegetation,  concrete,  metal,  brick, 
mixing  of  asphalt  and  vegetation,  and  mixing  of  asphalt  and  concrete.  We  show  the  two 
level- 1 classification  schemes  for  the  RGB  image  and  the  HI  image  in  Table  3-1. 

In  the  rules  for  level- 1 classification  of  an  HI  image,  all  pixels  having  higher  than 
elevation  23.0m  in  the  height  band  were  classified  as  tall  objects,  such  as  a building  or 
tree.  According  to  the  analysis  of  intensity  distribution  of  each  class  in  Figure  3-8, 
asphalt,  concrete,  vegetation,  and  metal  were  classified  by  intensity  values.  The  rules  for 
level- 1 classification  of  the  ALSM  data  image  are  in  Table  3-2. 

The  rules  for  level- 1 classification  of  an  RGB  image  will  be  created  by  the  radiance 
distribution  of  each  band  in  Figure  3-7.  The  range  of  pixel  values  of  blue  in  the  radiance 
distribution  is  changed  in  each  class  so  that  the  first  rules  were  created  based  on  the  pixel 
value  of  blue.  Then  according  to  the  pixel  value  of  red  and  green,  the  detail  rules  were 
created  to  separate  each  class.  For  example,  most  vegetation,  asphalt  and  brick  are  in  the 
blue  radiance  range  between  100  and  200,  but  the  brick  can  be  distinguished  using  higher 
red  radiance  value.  The  level  classification  rules  for  an  RGB  image  are  shown  in  Table  3- 
3. 


Table  3-1.  Level-1  Classification  Scheme  Used  in  an  RGB  Image  and  an  HI  Image 


HI  image 

RGB  image 

Tall  Objects 

Vegetation 

Vegetation 

Concrete 

Concrete 

Asphalt  or  Vegetation 

Asphalt 

Metal 

Metal 

Brick 

Asphalt  or  Concrete 
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Table  3-2.  Rules  of  Level- 1 Classification  for  the  HI  Image 
IF  (ELEVATION  > 23.0  M)  THEN 
HI_CLASS  = TALL  OBJECTS 
ELSE 

IF  (INTENSITY  < 100.0)  THEN 
HI_CLASS  = ASPHALT 
IF  (100.0  < INTENSITY  < 200.0)  THEN 
HI_CLASS  = CONCRETE 
IF  (200.0  < INTENSITY  < 320.0)  THEN 
HICLASS  = VEGETATION 
IF  (INTENSITY  > 320.0)  THEN 
HI_CLASS  = METAL 

ENDIF 


Based  on  two  level- 1 classifications  from  the  RGB  image  and  the  HI  image,  the  rules 
for  the  data  fusion  from  each  sensor  were  created  to  produce  level-2  classification  that 
present  information  from  two  combined  data  sets.  The  level-2  classification  rule  created 
eight  classes:  grass,  tree,  building,  sidewalk,  road,  median,  car,  and  metal  object  on  the 
roof,  such  as  a pipe  or  table.  We  show  the  level-2  classification  schemes  in  Table  3-4. 
Clearly,  the  fusion  of  the  two  types  of  data  can  be  used  to  generate  a better  classification. 
The  brick  cannot  be  classified  using  the  HI  image  but  can  be  classified  using  the  RGB 
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Table  3-3.  Rules  of  Level- 1 Classification  for  Digital  Color  Image 

IF  (BLUE  < 100.0)  THEN  RGB  CLASS  = VEGETATION 
IF  (100.0  < BLUE  < 200.0)  THEN  BEGIN 
IF  (GREEN  < 100.0)  THEN  BEGIN 
IF  (RED  > 140.0)  THEN  BEGIN 
RGB_CLASS  = BRICK 
ELSE 

RGB  CLASS  = ASPHALT  OR  VEGETATION 
ENDIF 
ELSE 

RGB  CLASS  - ASPHALT  OR  CONCRETE 
ENDIF 
ENDIF 

IF  (BLUE  > 200.0)  THEN 

IF  (GREEN  GT  200.0)  THEN 
RGB_CLASS  = METAL 
ELSE 

RGB_CLASS  = CONCRETE 
ENDIF 


ENDIF 
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Table  3-4.  Level-2  Classification  Scheme  after  Data  Fusion  Using  Production  Rules 

Level-2  classes 

Grass 

Tree 

Building 

Concrete  Sidewalk 
Asphalt  Road  & Parking  lot 
Brick  median 
Car 

Metal  Pipe  on  the  roof 


image  data.  The  asphalt  can  be  classified  using  the  HI  image  but  cannot  be  classified 
using  the  RGB  image.  After  combining  two  data  sets  using  production  rules,  the  level-2 
classification  image  showed  that  it  classified  brick  and  asphalt  successfully.  With  the  tall 
objects  class  from  level- 1 classification  for  the  HI  image,  the  level-2  classification  for 
fusion  of  two  level- 1 classifications  can  classify  vegetation  into  tree  and  grass,  and 
concrete  into  building  and  sidewalk.  Table  3-5  shows  the  rule  for  the  level-2 
classification. 

Table  3-5.  Level-2  Production  Rules  for  Data  Fusion  of  ALSM  and  ADP  Level- 1 
Classification  Data 

IF  (RGB  CLASS  - VEGETATION)  THEN 

IF  (HI  CLASS  = TALL  OBJECTS)  THEN 
CLASS  = TREES 

IF  (HI_CLASS  = ASPHALT)  THEN 
CLASS  = ROADS 

IF  (HI_CLASS  = CONCRETE)  THEN 


CLASS  = SIDEWALK 
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Table  3-5.  Continued 

IF  (HI  CLASS  = VEGETATION  OR  METAL)  THEN 
CLASS  = GRASS 

ENDIF 

IF  (RGB  CLASS  = CONCRETE)  THEN 

IF  (HI_CLASS  = TALL  OBJECTS)  THEN 
CLASS  = BUILDING 
IF  (HI_CLASS  = Asphalt)  THEN 
CLASS  = ROADS 

IF  (HI_CLASS  = CONCRETE)  THEN 
CLASS  = SIDEWALK 

IF  (HI  CLASS  = VEGETATION  OR  METAL)  THEN 
CLASS  = CAR 

ENDIF 

IF  (RGB_CLASS  = ASPHALT  OR  VEGETATION)  THEN 
IF  (HI_CLASS  = TALL  OBJECTS)  THEN 
CLASS  = TREES 

IF  (HI_CLASS  = ASPHALT)  THEN 
CLASS  - ROADS 

IF  (HI  CLASS  = CONCRETE)  THEN 


CLASS  = SIDEWALK 
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Table  3-5.  Continued 

IF  (HI  CLASS  = VEGETATION  OR  METAL)  THEN 
CLASS  = GRASS 

ENDIF 

IF  (RGB  CLASS  = METAL)  THEN 

IF  (HI  CLASS  = TALL  OBJECTS)  THEN 

CLASS  = METAL  ON  THE  ROOF 

ELSE 

CLASS  = CAR 
ENDIF 
ENDIF 

IF  (RGB  CLASS  =BRICK)  THEN  BEGIN 

IF  (HI_CLASS  = TALL  OBJECTS)  THEN 
CLASS  = BUILDING 

ELSE 

CLASS  = MEDIAN 
ENDIF 
ENDIF 

IF  (RGB_CLASS  = ASPHALT  OR  CONCRETE)  THEN 
IF  (HI  CLASS  = TALL  OBJECTS)  THEN 
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Table  3-5.  Continued 

CLASS  = BUILDING 
IF  (HI_CLASS  = ASPHALT)  THEN 
CLASS  = ROADS 

IF  (HI_CLASS  = CONCRETE)  THEN 
CLASS  = SIDEWALK 

IF  (HI  CLASS  = VEGETATION  OR  METAL)  THEN 
CLASS  = SIDEWALK 

ENDIF 


Results 

We  generated  five  rules  to  produce  level- 1 classification  for  the  HI  image  and  nine 
rules  for  level- 1 classification  of  the  RGB  image.  The  level- 1 classification  of  the  RGB 
image  clearly  shows  manmade  objects,  such  as  a building  or  concrete  sidewalk,  as  shown 
in  Figure  3-9.  In  this  classification,  asphalt  was  not  clearly  distinguished  with  vegetation 
or  concrete  classes. 

The  level- 1 classification  of  the  HI  image  clearly  shows  tall  objects  and  vegetation 
because  of  their  height  and  near  infrared  intensity  bands  (see  Figure  3-10).  The  asphalt 
class  in  this  classification  was  separated  from  other  objects  because  of  its  distinguishable 
reflectance  value.  By  the  characteristics  of  the  intensity  band,  the  classification  of  the  HI 
image  cannot  separate  brick  from  concrete,  and  does  not  produce  a clear  boundary  of 
objects  when  compared  to  the  level- 1 classification  of  the  RGB  image. 
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CZI  Concrete  H Brck  Asphalt  or  Correte 

Figure  3-9.  Rule-based  classification  map;  level- 1 classification  of  RGB  image  (red, 
green,  and  blue  color  image). 
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Figure  3-10.  Rule-based  classification  map;  level- 1 classification  of  the  HI  image  (height 
and  intensity  bands). 
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For  the  fusion  of  two  data  sets,  26  rules  were  created  to  produce  a level-2 
classification  using  eight  information  classes.  The  level-2  classification  is  shown  in 
Figure  3-1 1 This  classification  image  shows  a detailed  classification  image  that  was  not 
possible  in  a single  image  alone.  It  has  brick  class  that  was  not  shown  in  the  level- 1 
classification  image  of  the  HI  image,  and  a clearer  asphalt  class  that  was  not  plainly 
distinguished  in  the  level- 1 classification  image  of  the  RGB  image.  The  fusion  of  two 
classification  images  also  gives  clear  separation  between  tree  and  grass,  metal  on  the  roof 
and  car,  and  concrete  building  and  concrete  sidewalk  classes. 


Figure  3-11.  Classification  level-2  image  after  combining  the  RGB  image  and  the  HI 
image  using  production  rules. 
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In  level-2  classification,  however,  some  trees  near  the  concrete  sidewalk  were 
classified  as  a building.  Small  branches  of  trees  on  the  concrete  sidewalk  were  classified 
as  a sidewalk  or  mix  of  asphalt  and  vegetation  in  the  classification  level- 1 of  the  RGB 
image  because  they  do  not  cover  sidewalk.  But  they  can  be  presented  as  tall  objects  in  the 
HI  image  when  laser  pulses  in  ALSM  system  hit  on  branches,  not  concrete  sidewalk. 

Table  3-6  shows  the  confusion  matrix  analyses  of  the  rule-based  classification  level-2. 
Overall  accuracy  for  level-2  was  92.7362%,  and  individual  class  accuracies  ranged  from 
68.35%  of  brick  to  98.46%  of  tree.  The  brick  and  metal  classes  have  low  accuracies  but 
their  user’s  accuracies  of  all  classes  are  over  85%. 


Table  3-6.  Confusion  Matrix  of  Rule-based  Classification  (Pixels) 


CLASS 

Building 

Tree 

Grass 

Road 

Sidewalk 

Brick 

Car 

Metal 

Total 

Building 

562 

9 

0 

0 

0 

0 

0 

24 

595 

Tree 

16 

779 

31 

0 

0 

0 

0 

0 

825 

Grass 

0 

0 

640 

0 

3 

5 

0 

0 

648 

Road 

0 

0 

0 

276 

16 

10 

6 

0 

308 

Sidewalk 

31 

0 

24 

28 

779 

35 

18 

0 

915 

Brick 

0 

0 

0 

0 

0 

108 

0 

0 

108 

Car 

11 

0 

0 

0 

0 

0 

174 

0 

185 

Metal 

0 

0 

0 

0 

0 

0 

0 

78 

78 

Total 

620 

788 

694 

304 

798 

158 

198 

102 

3662 

Overall  Accuracy- (3396/3662)  92.7362%  Kappa  Coefficient  = 0.9119 


Class  Producer’s  Accuracy  User’s  Accuracy  Producer’s  Accuracy  User’s  Accuracy 


Grass 

(%) 

92.22 

(%) 

98.77 

(Pixels) 

640/694 

(Pixels) 

640/648 

Tree 

98.86 

94.42 

779/788 

779/825 

Building 

90.65 

94.45 

562/620 

562/595 

Sidewalk 

97.62 

85.14 

779/798 

779/915 

Car 

87.88 

94.05 

174/198 

174/185 

Metal 

76.47 

100.00 

78/102 

78/78 

Road 

90.79 

89.61 

276/304 

276/308 

Brick 

68.35 

100.00 

108/158 

108/108 
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Discussion 

Many  classification  methods  are  available  for  image  processing  in  a remote  sensing 
area,  and  no  single  classification  solution  always  performs  best,  bi  this  research,  we 
applied  the  expert  system— production  rule— to  classify  ground  objects  using  two  different 
data  sources. 

Production  rules  are  easy  to  make  and  apply,  but  the  key  problem  in  the  application  of 
this  approach  to  classification  and  fusion  of  different  data  sources  is  how  to  make 
reasonable  rules.  Since  this  technique  totally  depends  on  the  rules  that  can  be  created  by 
users  or  operators,  the  initial  data  set  should  be  carefully  analyzed  to  find  applicable 
rules.  It  may  take  another  step  to  understand  the  data  set  to  be  used,  and  to  analyze  the 
geometric  characteristics  of  the  study  areas.  Every  rule  is  suitable  for  specific  data  sets 
and  study  areas.  When  the  data  set  to  be  used  or  the  study  areas  are  changed,  the  new 
rules  should  be  created  case  by  case. 

Summary 

To  achieve  greater  classification  detail  and  accuracy,  the  two  data  sets— the  RGB 
digital  color  image  and  the  HI  (height  and  intensity  images)  image— from  different 
sensors  (ALSM  and  ADP  system)  were  fused  with  a given  combination  of  spectral  and 
spatial  resolution  by  using  the  expert  system.  The  simplest  and  perhaps  most  common 
expert  system  is  through  the  generation  of  a set  of  production  rules.  These  rules  are 
implemented  as  a succession  of  IF/THEN  statements.  The  production  rule  is  easy  to  make 
and  apply,  but  the  key  problem  to  apply  for  classification  and  fusion  of  different  data 
sources  is  how  to  make  reasonable  rules. 

We  set  two  levels  for  classification  and  fusion  research  using  the  expert  system. 

Level- 1 is  for  classification  for  each  image  data  using  production  rules,  and  level-2  is  for 
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fusion  of  data  from  each  classification  level- 1.  We  generated  five  rules  for  classification 
level- 1 for  the  HI  image  and  nine  rules  for  classification  level- 1 for  the  RGB  image.  For 
the  fusion  of  two  data  sets,  the  26  rules  for  the  classification  level-2  were  created  to 
classify  eight  land-cover  classes:  grass,  tree,  building,  sidewalk,  road,  median,  car  and 
metal  object  on  the  roof,  such  as  pipe  or  table. 

The  classification  image  from  level-2  has  92.7362%  overall  accuracy,  and  individual 
class  accuracies  ranged  from  68.35%  of  brick  to  98.46%  of  tree.  The  result  shows  that  the 
fusion  of  data  from  different  sources  can  share  redundant  and  complementary 
information,  and  also  provide  greater  classification  detail  and  accuracy.  Importantly,  this 
experiment  also  demonstrates  the  process  of  moving  from  data  to  information. 


CHAPTER  4 

CLASSIFICATION  USING  THE  DEMPSTER-SHAFER  EVIDENTIAL  THEORY 
In  this  chapter,  we  present  the  Dempster-Shafer  evidential  theory  and  investigate  its 
application  for  fusion  of  remote  sensing  data.  The  low  resolution  near  infrared  images, 
the  elevation  data  from  the  ALSM  system,  and  the  digital  images  from  the  ADP  system 
will  be  combined  using  the  Dempster-Shafer  evidential  theory. 

The  Dempster-Shafer  Evidential  Theory 
Dempster  and  Shafer  evidential  theory  is  one  possible  method  for  decision-level  data 
fusion,  as  explained  in  Chapter  1.  Figure  4-1  shows  the  diagram  for  decision-level  fusion 
provided  by  Tuell  et  al.  (2001). 

Dempster  and  Shafer  presented  a generalization  of  the  Bayesian  theory  for  a general 
level  of  uncertainty  (Lowrance  & Garvey,  1982).  The  Dempster-Shafer  (D-S)  approach 
tries  to  follow  the  way  humans  assign  evidence  to  hypothetical  propositions.  The 
Dempster-Shafer  method  points  out  that  humans  assign  measures  of  belief  to 
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Figure  4-1.  Decision-level  data  fusion  (Tuell  et  al.,  2001). 
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combination  of  hypotheses  (that  is,  to  propositions  rather  than  hypotheses),  not  assigning 
evidence  (that  is,  probabilities)  to  a set  of  mutually  exclusive  and  exhaustive  hypotheses 
(Hall,  1992). 

In  the  Dempster-Shafer  method,  probability  intervals  and  uncertainty  intervals  are 
used  to  determine  the  likelihood  of  hypotheses  based  on  various  evidences.  The  use  of 
evidential  reasoning  allows  combining  information  at  its  own  level  of  detail  from  each 
sensor.  For  example,  one  sensor  may  be  able  to  provide  information  that  can  be  used  to 
distinguish  the  height  of  objects,  whereas  information  from  another  sensor  may  be  able  to 
distinguish  only  the  shape  of  objects.  In  the  Bayesian  approach,  all  propositions  (for 
example,  objects  in  the  environment)  having  no  information  are  assigned  an  equal  a 
priori  probability.  When  the  number  of  unknown  propositions  is  relatively  large  to  the 
number  of  known  propositions,  and  additional  data  from  sensors  are  available,  the  main 
problem  of  the  Bayesian  approach  is  the  unfaithful  result  because  the  probabilities  of 
known  propositions  become  unstable  (Abidi  & Gonzales,  1992).  The  Dempster-Shafer 
evidential  theory  was  developed  in  an  attempt  to  overcome  some  limitations  of 
probability  theory.  Many  researchers  have  explored  the  application  of  Dempster-Shafer 
evidential  reasoning  in  multisensor  target  identification  and  military  command  and 
control,  respectively  (Bogler,  1987;  Waltz  & Buede,  1989;  Buede,  1988).  We  will  try  the 
Dempster-Shafer  evidential  theory  as  a possible  approach  for  investigation  for  data  fusion 
in  remote  sensing  research  areas. 

The  Concept  of  Evidential  Reasoning 

The  concept  of  the  Dempster-Shafer  evidential  theory,  followed  by  Hall  (1992)  and 
Abidi  and  Gonzalez  (1992),  is  introduced  below. 
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To  understand  the  Dempster- Shafer  evidential  theory,  we  should  know  what 
hypotheses  and  propositions  are.  According  to  the  description  by  Hall  (1992),  a 
hypothesis  is  a fundamental  statement  about  nature  (that  is,  an  object  is  a building).  A 
proposition  may  be  either  a hypothesis  or  a combination  of  hypotheses.  Propositions  may 
contain  overlapping  or  conflicting  hypotheses  (for  example,  proposition  1 = the  object  is 
a building,  proposition  2 = the  object  is  a building  or  tree,  proposition  3 = the  object  is  a 
tree).  Proposition  1 overlaps  with  proposition  2,  proposition  2 overlaps  with  proposition 
3,  and  proposition  1 conflicts  with  proposition  3.  This  set  of  n mutually  exclusive  and 
exhaustive  sets  of  propositions  about  a target  area  will  be  in  this  form: 

P = {A,,  A2,...,A„} 

When  P denotes  the  set  of  n elemental  propositions,  the  number  of  general 
propositions  by  Boolean  combinations  of  the  original  set  is  2”’’: 

{A,  V Aj,  A,  V Aj,...} 

where,  the  symbol  v represents  a Boolean  OR. 

The  key  concept  of  the  Dempster-Shafer  theory  is  the  representation  of  assigned 
evidence  that  is  called  a probability  mass,  m ( A, ),  w ( A2 ),  w ( A,  v A2 ),  and  so  forth. 

The  sum  of  all  probability  mass  assigned  to  both  elementary  and  general  propositions 
should  be 

The  Dempster-Shafer  evidential  theory  also  defines  the  concept  of  an  evidential  interval 
that  is  the  difference  between  the  measures  of  the  support  function  for  a proposition  and 
the  plausibility  of  a proposition. 
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The  support  function  for  a proposition  [For  example,  Support  ( A, )]  is  the  sum  of  the 
probability  masses  for  a proposition  (both  within  elementary  and  general  propositions).  In 
a simple  proposition,  the  support  function  is  simply  the  probability  of  a proposition.  In  a 
general  proposition,  for  example,  O,-  = A,  v Aj  v Aj , then  the  support  for  O,-  is  the  sum 
of  probability  masses  contributing  to  all  elements  of  O,  : 

Support  (A,  V V Aj)  = m {A^)  + m (A^)  + m (Aj)  + m {A^  V A2) 

+ m ( A,  V A3 ) + ( A2  V Aj ) 

+ m (A,  V A2  V Aj) 

The  plausibility  of  a proposition  is  one  minus  the  evidence  supporting  its  negation,  such 
as: 

Plausibility  ( A, ) = 1 - Support  ( A, ) 

In  a similar  manner,  “doubt”  functions  and  evidential  intervals  are  defined  as 

Doubt  ( A, ) = Support  ( A, ), 

Evidential  interval  ( A, ) = Plausibility  ( A, ) - Support  ( A, ). 

The  support  function  is  considered  as  the  minimum  amount  of  evidence  of  the  targeted 
object  in  a pixel  whereas  the  plausibility  of  a proposition  is  considered  as  the  maximum 
possible  evidence.  The  output  of  a Dempster-Shafer  process  is  a set  of  evidential  intervals 
that  is  a true  likelihood  for  the  pixel  that  may  lie  somewhere  in  the  interval.  As  the  inputs 
to  the  Dempster-Shafer  process,  probability  masses,  m (AJ,  should  be  assigned  by  an 
observer  or  sensor(s).  However,  this  a priori  input  data  in  the  Dempster-Shafer  process  is 
a key  problem  that  is  not  easily  defined  and  has  no  simple  answer.  This  problem  will  be 
described  in  the  experiment  section. 
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Dempster’s  Rule  of  Combination 

For  further  process,  the  probability  masses  from  different  individual  sensors  should  be 
combined  to  get  useful  information.  Dempster  defined  rules  of  combination  to  provide  a 
formalism  for  combining  probability  masses  from  multiple  sources,  and  Dillard 
summarizes  a number  of  rules  for  fusing  probability  masses  [see,  for  example,  Dempster 
(1968)  and  Dillard  (1982)]. 

Following  Thomopoulos  (1990)  and  Hall  (1992),  we  will  show  an  example  in  this 
section.  Let  us  consider  that  we  have  two  sensors.  Si  and  S2.  The  sensors  assign  evidence 
to  three  propositions: 

Proposition  1 ( ) : hypothesis  A is  true 

Proposition  2 ( uj ) : hypothesis  B is  true 
Proposition  3(1/2):  hypothesis  A or  B is  true 
Sensor  Si  observes  data  and  assigns  mass  probabilities  {mi  (uq),  mi  (w/),  m/  (w^)}  to  the 
three  propositions.  Likewise,  sensor  S2  assigns  mass  probabilities  {m2{uo),  m2  (ui),  m2 
(u2)}.  Table  4-1  summarizes  Dempster’s  combining  rules  for  this  case. 

Using  Dempster’s  rules  of  combination,  the  elements  of  the  matrix,  which  are  the  joint 
two-sensor  evidence,  are  shown  in  Table  4-1.  For  individual  propositions,  the  joint 
probability  mass  can  be  simply  expressed  as  the  product  of  the  probability  masses 
assigned  by  each  sensor.  For  example,  for  proposition,  uq,  sensor  Si  assigns  mass  mi(uo), 
while  sensor  S2  assigns  mass  of  m2(uo).  The  joint  probability  mass  for  proposition,  uo,  is 
simply  m(uo)=  mi(uo)  m2(uo).  This  is  illustrated  in  the  upper  left-hand  comer  of  the 
matrix  in  Table  4-1.  In  this  instance,  sensors  Si  and  S2  assign  evidences  to  the  same 
proposition  and  all  the  joint  assignments  can  be  created. 
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Suppose  sensor  2 assigns  m2(uo)  to  proposition  uo,  and  sensor  1 assigns  mi(u2)  to 
proposition  U2.  A proposition  is  overlapping  another  proposition.  In  this  case,  the  joint 
probability  mass  assignment  is  again  straightforward.  Dempster’s  combination  rule 
assigns  a joint  mass  to  proposition,  uo,  as  follows: 

m(uo)  = mi(u2)  ni2(uo) 

This  is  illustrated  in  the  lower  left-hand  comer  of  the  matrix  in  Table  4-1. 

In  the  case  for  which  a proposition  conflicts  with  other  propositions,  how  do  we  treat 
the  assignment  of  evidence?  Suppose  sensor  Si  assigns  evidence  mi(uj)  to  proposition  m 
and  sensor  S2  assigns  evidence  m2(uo)  to  proposition  uq.  These  evidential  assignments  are 
in  conflict  (that  is,  in  essence.  Si  declares  that  hypothesis  B is  true  while  sensor  S2 
declares  that  hypothesis  A is  true).  Dempster’s  mles  of  combination  compute  a 
normalizing  factor,  C,  which  is  the  sum  of  the  products  of  masses  assigned  to  conflicting 
propositions.  For  our  two-sensor,  three-proposition  example,  the  normalizing  factor  can 
be  expressed: 

C-ko\-\-  k\o 

Then,  Dempster’s  mle  of  combination  may  be  written  for  two  independent  sources  as 


Table  4-1.  Dempster’s  Combining  Rule  [ 

fom  Thomopoulos  (1990)] 

Si  S2 

m2iuo) 

m2(ui) 

rri2{u2) 

mi{uo) 

m{uo)^  mi{uo)  m2{uo) 

kio=  mi{uo)  m2{ui) 

m{uo)=  mi(uo)  rri2{u2) 

mi{ui) 

koi=mi(ui)  m2(uo) 

m(u/)=  mi(uj)  m2{ui) 

m(ui)=  mj(ui)  ni2{u2) 

mi(u2) 

m{uo)=  mi{u2)  m2(uo) 

mi(u2)  ni2(ui) 

m(w2)=  rni{u2)  m2(u2) 
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m{u,) 


(1-C) 


C=  ^/m,(4)w2(5J 

where,  ^ is  the  empty  set,  and  ui  is  a general  proposition  defined  as  a Boolean 
combination  of  elemental  hypotheses  Ai  and  Bj. 

The  joint  probability  mass  for  ui  is  the  sum  of  the  product  of  all  probability  masses 
(from  each  sensor)  over  all  propositions  that  contain  nonconflicting  hypothetical  Ai  and 
Bj.  For  example  in  Table  4-1,  there  are  three  contributions  to  probability  mass  for 
hypothesis  uq.  First,  when  both  sensors  Si  and  S2  assign  evidence  to  proposition  wo.the 
joint  probability  can  be  assigned  as  mi(uo)  ni2(uo).  The  second  contribution  for  hypothesis 
uo  is  the  joint  probability  mass  mi(uo)  ni2(u2),'^hQn  sensor  Si  assigns  evidence  to 
proposition  uq,  and  when  sensor  S2  assigns  evidence  to  the  nonconflicting  proposition  «2- 
When  sensor  S2  assigns  evidence  to  proposition  uq,  and  sensor  Si  assigns  evidence  to  the 
nonconflicting  proposition  U2,  the  joint  probability  mass  mj(u2)  m2(uo)  can  be  a third 
contribution  for  hypothesis  uq. 

These  three  contributions  are  summarized  and  corrected  for  the  conflicting  evidence  in 
which  sensor  S2  assigns  evidence  to  proposition  ui,  while  sensor  Si  assigns  evidence  to 
propositions  uo  (i.e.,  ^1)  and  conversely,  A:io.  Using  these  joint  probability  masses,  we 
may  compute  the  evidential  interval  that  has  already  been  explained. 
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Experiment 

Prior  Probability  from  the  Maximum  Likelihood  Decision  Rule 

To  investigate  the  Dempster-Shafer  evidential  theory  for  digital  color  images  from  the 
ADP  system  and  elevation  and  near  infrared  images  from  the  ALSM  system,  first  we 
require  prior  probability  to  calculate  mass  assignments  of  each  class  for  each  pixel. 

In  this  research,  the  maximum  likelihood  classification  using  Bayes’s  theorem  was 
applied  and  tested  to  get  the  prior  probability.  Richards  and  Jia  (1999)  have  noted  if 
sufficient  training  data  are  available  for  each  ground  cover,  they  can  be  used  to  estimate  a 
probability  distribution  for  a cover  type  that  describes  the  chance  of  finding  a pixel  from 
class  c,  at  the  position  x.  It  can  be  represented  by  the  symbol  p (x|  c,).  The  desired  p (c,(x) 
and  the  available p (x|  c,)--estimated  from  training  data— are  related  by  Bayes’s  theorem 
(Freund,  1992): 

P (C/W  =p  (x\  d)p  ( c,)  Ip  (x) 

where  p (c,)  is  the  probability  that  class  c,  occurs  in  the  image.  If,  for  example,  15%  of  the 
pixels  of  an  image  happen  to  belong  to  spectral  class  c,  then  p (c,)  = 0.15;/?  (x)  is  the 
probability  of  finding  a pixel  from  any  class  at  location  x.  For  the  reader’s  convenience, 
the  maximum  likelihood  classification  using  Bayes’s  theorem  is  explained  in  Appendix 
D. 

The  key  idea  for  the  Dempster-Shafer  evidential  theory  is  that  the  probability  from  the 
maximum  likelihood  algorithms  can  be  used  as  a priori  data  for  probability  mass 
assignments.  Each  class  may  have  different  probabilities.  To  apply  the  Dempster-Shafer 
theory,  the  sum  of  a priori  probability  should  be  1 . Each  probability  from  each  class, 
therefore,  was  normalized  to  make  its  sum  to  1 . 
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Data  Fusion  Processing  Using  the  Dempster-Shafer  Evidential  Theory 

In  this  research,  we  use  two  sets  of  image  data  from  two  different  sensors:  three-band 
digital  color  image  from  the  ADP  system,  and  the  two-band  HI  image  from  the  ALSM 
system. 

To  apply  the  maximum  likelihood  decision  classification  to  get  the  prior  probabilities, 
supervised  training  sites  for  each  image  data  were  carefully  selected  and  analyzed.  The 
training  data  of  an  RGB  digital  color  image  are  shown  in  Figure  4-2  (a).  As  a result  of 
classification,  the  digital  color  image  was  classified  with  seven  classes,  including  asphalt, 
concrete,  grass,  tree,  brick,  metal,  and  unknown.  The  classification  of  the  RGB  image  is 
shown  in  Figure  4-2  (b).  According  to  the  reflectance  of  the  material  in  the  target  training 
set,  we  used  a maximum  likelihood  decision  rule  algorithm  to  classify  the  test  image.  In 
the  classified  concrete  class,  we  cannot  clearly  distinguish  between  a concrete  roof 
building  and  a concrete  sidewalk  without  height  data.  There  are  two  vegetation  classes, 
grass  and  tree,  but  we  cannot  correctly  classify  the  grass  and  tree  class  without  elevation 
data.  In  this  classification,  the  asphalt  class  data  have  road,  sidewalk  and  building 
information.  As  another  example,  the  tree  class  has  actual  tree  and  grass  information.  The 
correlation  between  data  and  information  in  classes  for  digital  color  images  is  shown  in 
Table  4-2.  The  mean  and  standard  deviations  of  each  class  for  classification  in  the  RGB 
image  are  shown  in  Table  4-3.  A priori  probabilities  of  each  class  for  a sample  pixel  1 in 
a digital  color  image  are  shown  in  the  second  column  in  Dempster’s  combination  rule  in 
Table  4-5.  According  to  their  probabilities,  the  sample  pixel  1 in  Figure  4-2  (b)  was 
classified  as  a tree  class  because  the  probability  for  tree  is  the  highest  probability  for  that 
pixel. 
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Table  4-2.  The  Correlation  Between  Data  and  Information  of  Classes  for  an  RGB  Digital 
Color  Image  


Class 

Information 

Asphalt 

Asphalt  Roads,  Sidewalk,  or  Building 

Concrete 

Concrete  sidewalk  or  Building 

Grass 

Grass  or  Tree 

Tree 

Tree  or  Grass 

Brick 

Brick  or  Building 

Metal 

Car  or  Metal  on  roof 

Unknown 

Unknown  or  can  be  anything 

Table  4-3.  Summary  of  the  Maximum  Likelihood  Classification 


Red 

Green 

Blue 

Class 

Mean 

Stdv. 

Mean 

Stdv. 

Mean 

Stdv. 

Asphalt 

122.556 

13.452 

93.413 

9.659 

114.694 

10.92 

Concrete 

182.537 

28.331 

143.975 

26.078 

167.126 

29.084 

Tree 

84.695 

13.399 

66.697 

8.735 

84.146 

7.725 

Grass 

116.061 

10.169 

89.071 

6.284 

99.421 

6.838 

Brick 

139.765 

8.573 

87.635 

5.844 

103.488 

6.198 

Metal 

252.402 

4.862 

210.164 

5.039 

243.845 

8.428 

Unknown 

89.644 

8.158 

57.044 

11.451 

100.156 

10.045 

or  an  RGB  Image 
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Figure  4-2.  Classification  of  a RGB  image:  (a)  Training  set  on  the  RGB  image;  (b)  the 
maximum  likelihood  classification  image  of  an  RGB  image. 


The  maximum  likelihood  classification  of  an  HI  image  has  six  classes:  tree,  building, 
asphalt,  grass,  concrete,  and  unknown.  The  training  data  of  an  HI  image  are  shown  in 
Figure  4-3  (a).  Since  the  HI  image  has  a near  infrared  band,  it  is  very  useful  to  distinguish 
vegetation  and  non-vegetation  classes.  The  HI  image  also  is  very  helpful  to  discern 
objects  by  their  height  data,  such  as  trees  or  buildings,  and  lower  height  objects,  such  as 
grass  or  sidewalk,  because  of  a height  data  band.  The  classification  of  an  HI  image  is 
shown  in  Figure  4-3  (b).  In  the  HI  classification  image,  the  building  class  included  an 
actual  building,  as  well  as  tree  and  metal  classes.  The  correlation  between  data  and 
information  in  classes  for  a digital  color  image  is  shown  in  table  4-4.  A priori 
probabilities  of  each  class  for  a sample  pixel  1 in  an  HI  image  are  shown  in  the  second 
row  in  Dempster’s  combination  rule  in  Table  4-7.  The  mean  and  standard  deviations  of 
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Figure  4-3.  Classification  of  an  HI  image;  (a)  Training  set  on  the  HI  image;  (b)  the 
maximum  likelihood  classification  image  of  an  HI  image 


each  class  for  classification  in  the  HI  image  are  shown  in  Table  4-5.  According  to  their 
probabilities,  the  sample  pixel  1 in  an  HI  image  was  classified  as  a building  class  because 
the  probability  for  building  is  the  highest  probability  for  that  pixel.  For  the  same  sample 
pixel  1,  the  maximum  likelihood  classification  of  the  RGB  image  classified  as  a tree 
class.  This  sample  pixel  1 is  a good  example  of  conflicting  information  fi'om  two 
different  data  sources  in  decision-level  fusion. 

With  the  classification  result  from  the  maximum  likelihood  decision  rule,  we  consider 
the  case  of  two  sensors,  ADP  and  ALSM,  which  assign  evidence  to  seven  and  six 
propositions,  respectively.  Sensor  1,  ADP  system,  observes  parametric  data  and  assigns 
mass  probabilities  Asphalt),  itiadp  (Concrete),  rriADP  (Grass),  itiadp  (Tree),  ntADP 
(Brick),  niADP  (Metal),  rttADP  (Unknown)}  to  the  seven  propositions.  Similarly,  sensor  S2, 
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Table  4-4.  The  Correlation  Between  Data  and  Information  of  Classes  for  an  HI  Image 


Class 

Information 

Tree 

Tree,  Building,  or  Metal  on  the  roof 

Asphalt 

Road,  Brick  or  Car 

Building 

Tree,  Building,  or  Metal  on  the  roof 

Grass 

Grass,  Sidewalk,  Car,  or  Brick 

Concrete 

Sidewalk,  Grass,  Brick  or  Car 

Unknown 

Unknown  or  Anything 

Table  4-5.  Statistics  Summary  of  the  Maximum  Likelihood  Classification  for  an  HI 
Image 


Height 

Intensity 

Class 

Mean 

Stdv. 

Mean 

Stdv. 

Asphalt 

21.266 

0.853 

73.832 

20.112 

Concrete 

19.985 

1.265 

155.378 

14.529 

Tree 

34.224 

3.149 

116.160 

47.693 

Grass 

20.395 

1.460 

265.164 

28.311 

Building 

30.672 

2.041 

135.485 

17.815 

Unknown 

34.875 

0.131 

379.774 

26.892 

ALSM  system,  assigns  six  mass  probabilities  {/w^is^XBuilding),  ttialsm  (Tree),  itialsm 
(Asphalt),  rriALSM  (Grass),  itialsm  (Concrete),  thalsm  (Unknown)}.  Table  4-6  shows  the 
Dempster’s  combination  rule  for  the  fusion  of  the  RGB  image  from  the  ADP  system,  and 
the  HI  image  from  the  ALSM  system.  Using  these  a priori  probabilities  from  the 
maximum  likelihood  decision  rule  of  each  set  of  data,  the  mass  assignments  resulting 
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from  the  fusion  of  the  information  from  the  RGB  image  and  the  HI  image  using 
Dempster’s  rule  can  be  calculated.  One  test  pixel  was  chosen  and  the  Dempster’s  rule 
was  calculated  in  Table  4-7.  Since  the  normalization  factor  is  calculated  as  1 minus  the 
sum  of  the  four  As  in  Table  4-6,  which  is  0.0,  the  normalization  factor  is  1.0-0. 0=1.0. 

Table  4-7  shows  the  mass  assignments  resulting  from  the  fusion  of  the  combined 
information  from  sensor  1 (RGB  image  data)  and  sensor  2 (HI  image  data).  As  a result  of 
the  fusion,  positive  support  can  be  attributed  to  the  individual  objects.  The  most  likely 
object  is  T,  tree,  as  indicated  by 

Support  /,2(Tree)  = aw/.2(T)  = 0.93217 
And 

Doubt/^Tree)  = ^/.^(B)  + w/,2(R)  + w/,2(Br)  + /w/,2(C)  + + mij(S)  + mu(G)  + 

m/,2(B  VS)  + /w/.2(BrVB)+  m/,2(C  VM)  +m/,2(R  VBr  VC)  + 
m u(R  VS  VB)+  m u(G  VS  VBr  VC) 

= 0.03024. 

where  B is  Building,  R is  Road,  Br  is  Brick,  C is  Concrete,  M is  Metal,  S is  Sidewalk, 
and  G is  Grass. 

The  evidence  for  this  conclusion  is  quite  conclusive,  as  indicated  by  a small 
uncertainty  and  a narrow  belief  interval  for  T,  tree: 

Evidential  interval  y,2(T)  = Plausibility/,2(T)  - Support/,2(T)  = 0.03759, 
[Support/,2(T),  Plausibility/, 2(T)]  = [0.93217,  0.96976], 
where  the  plausibility  of  T is 

Plausibility/,2(T)  = 1 - Doubt/.2(T)  = 0.96976. 


Table  4-6.  Dempster’s  Combination  Rule  of  an  RGB  Image  and  an  HI  Image 
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The  next  likely  object  is  also  quite  conclusively  the  building  class  that  has 
[Support/,2(B),  Plausibility/ XB)]  = [0.03024,  0.06783].  If  additional  information 
becomes  available,  it  can  easily  be  combined  with  the  previous  evidence  to  possibly 
increase  the  conclusiveness  of  the  recognition. 

Results 

According  to  Dempster’s  combination  rule,  a new  classification  image  was  created 
with  eight  new  classes:  building,  tree,  sidewalk,  road,  grass,  brick,  car,  and  metal.  The 
classification  image  by  the  Dempster-Shafer  evidential  theory  is  shown  in  Figure  4-4. 

The  classification  image  by  the  Dempster-Shafer  evidential  theory  clearly  shows  the 
brick  class  that  was  not  detected  in  the  HI  image,  and  the  evident  boundary  between  tree 
and  grass  that  was  not  clear  in  the  RGB  image.  This  classification  image  provides  greater 
classification  detail,  and  shows  that  the  fusion  and  sharing  of  different  data  sources  are  a 
powerful  method  for  object  recognition  that  is  impossible  in  just  one  single  data  source. 

In  the  accuracy  assessment  of  classification  image  by  the  Dempster-Shafer  evidential 
theory,  the  overall  accuracy  is  94.2381%.  The  road  class  has  the  highest  producer’s 
accuracy  of  99.67%  and  its  user’s  accuracy  is  87.32%.  The  metal  class  has  the  lowest 
producer’s  accuracy  of  68.63%,  and  its  user’s  accuracy  is  100%.  Table  4-8  shows  the 
accuracy  assessment  of  data  fusion  classification  by  the  Dempster-Shafer  evidential 
theory.  The  accuracy  assessment  shows  that  all  ground  object  classes,  except  metal,  have 
a high  percentage  of  accuracy.  All  classes  also  have  high  user’s  accuracies. 

Compared  to  previous  research--a  supervised  pattern  recognition  technique  and  expert 
system  approach-the  Dempster-Shafer  evidential  theory  has  a higher  overall  accuracy. 
We  will  discuss  the  comparison  between  three  data  fusion  experiments  in  Chapter  5. 


Building 
Asphalt  Road 


Concrete  SkJewalc 
Median  Brtk 


Grass 

Car 


TrK 

Metal  on  the  roof 


Figure  4-4.  Data  fusion  classification  map  fi-om  the  Dempster-Shafer  algorithm. 


Summary 

In  this  research,  we  investigated  the  Dempster-Shafer  evidential  theory  for  the  fusion 
of  two  images  from  different  data  sources.  The  key  problem  in  applying  this  approach 
was  the  acquisition  of  a prior  probability  of  ground  object  categories.  We  investigated  the 
use  of  probability  from  a maximum  likelihood  classification  as  prior  probability.  We  used 
two  sets  of  image  data  from  two  different  sensors:  three-band  digital  RGB  color  image 
from  the  ADP  system  and  two-band  HI  image  from  the  ALSM  system. 


82 


Table  4-8.  Confusion  Matrix  of  Data  Fusion  Classification  by  Dempster-Shafer 
Evidential  Theory  (Pixels)  


CLASS 

Building 

Tree 

Grass 

Road 

Sidewalk 

Brick 

Car 

Metal 

Total 

Building 

590 

28 

0 

0 

1 

0 

0 

32 

651 

Tree 

7 

760 

6 

0 

0 

0 

0 

0 

773 

Grass 

4 

0 

659 

0 

6 

1 

0 

0 

670 

Road 

0 

0 

0 

303 

41 

3 

0 

0 

347 

Sidewalk 

19 

0 

29 

0 

750 

2 

30 

0 

830 

Brick 

0 

0 

0 

1 

0 

152 

1 

0 

154 

Car 

0 

0 

0 

0 

0 

0 

167 

0 

167 

Metal 

0 

0 

0 

0 

0 

0 

0 

70 

70 

Total 

620 

788 

694 

304 

798 

158 

198 

102 

3662 

Overall  Accuracy  = (345 1/3662)  9^ 

.2381  % Kappa  Coefficient  = 0 

.9304 

Class  Producer’s  Accuracy  User’s  Accuracy  Producer’s  Accuracy  User’s  Accuracy 


(%) 

(%) 

(Pixels) 

(Pixels) 

Grass 

94.96 

98.36 

659/694 

659/670 

Tree 

96.45 

98.32 

760/788 

760/773 

Building 

95.16 

90.63 

590/620 

590/651 

Sidewalk 

93.98 

90.36 

750/798 

750/830 

Car 

84.34 

100.00 

167/198 

167/167 

Metal 

68.63 

100.00 

70/102 

70/70 

Road 

99.67 

87.32 

303/304 

303/347 

Brick 

96.20 

98.70 

152/158 

152/154 

To  apply  the  maximum  likelihood  decision  classification  to  get  the  prior  probabilities, 
supervised  training  sites  for  each  image  data  were  carefully  selected  and  analyzed.  Two 
classification  images  were  created,  and  each  class  in  each  classification  image  had 
independent  class  probabilities  that  can  be  used  as  prior  probabilities.  Using  these  a 
priori  probabilities  from  the  maximum  likelihood  decision  rule  of  each  data,  the  mass 
assignments  resulting  from  the  fusion  of  the  information  from  the  RGB  image  and  the  HI 
image  using  Dempster’s  rule  were  calculated  and  a new  classification  image  was  created. 

In  the  accuracy  assessment  of  classification  image  by  the  Dempster-Shafer  evidential 
theory,  the  overall  accuracy  is  94.2381%,  and  all  ground  object  classes,  except  metal,  had 
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a high  percentage  of  producer’s  accuracy.  This  image  also  had  high  user’s  accuracies  for 
all  classes. 

This  research  shows  that  the  data  fusion  using  the  Dempster-Shafer  evidential  theory 
can  be  a powerful  alternative  method  to  combine  and  share  each  set  of  data  for  object 
recognition. 


CHAPTER  5 
CONCLUSIONS 

The  fusion  of  complementary  or  redundant  data  from  multiple  sensors  can  be  useful  in 
creating  more  consistent  recognition  of  land  cover  patterns  and  improving  precision.  In 
our  research,  we  have  investigated  possible  strategies  for  data  fusion  with  regard  to  a 
contemporary  problem  in  the  mapping  sciences:  how  to  best  merge  passive  reflected 
spectral  data  for  aerial  cameras  with  height  and  intensity  data  from  airborne  laser 
systems. 

As  a preprocessing  step  in  our  fusion  experiments,  we  have  developed  a direct  digital 
image  georeferencing  (DDIG)  procedure  to  coregister  the  spectral,  elevation,  and  laser 
intensity  data.  This  procedure  solved  the  important  problem  of  converting  the  central 
perspective  of  the  digital  photographs  to  an  orthographic  image,  which  can  be  properly 
merged  with  the  topographic  measurements  from  the  laser  data.  In  our  images,  we  used 
only  measured  data  point,  not  interpolated  values.  We  have  called  the  resulting 
georeferenced  image  a pseudo-orthorectified  image  (PORI)  because  we  used  only  GCPs 
at  the  ground  level.  Consequently,  in  the  PORI,  some  elevated  objects  are  not  in  their 
correct  planimetric  position.  In  our  images,  the  shifts  were  as  large  as  2 pixels  (about  40 
cm),  but  this  accuracy  is  sufficient  to  support  our  sensor  fusion  experiments. 

As  a framework  for  our  discussions,  we  have  followed  the  sensor  fusion  paradigm 
adopted  in  the  optical  engineering  community.  Within  it,  our  DDIG  procedure  may  be 
defined  as  a data-level  fusion  technique  because  we  use  the  height  data  to  generate 
corrections  for  the  photographs.  For  classification  and  object  recognition  purposes,  we 
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investigated  the  use  of  a supervised  statistical  pattern  recognition  technique,  maximum 
likelihood  classification  (MLC),  and  two  decision-level  techniques:  an  expert  system;  and 
a method  based  on  the  use  of  Dempster-Shafer  evidential  theory.  We  have  categorized 
the  MLC  experiment  as  a feature-level  fusion  experiment,  even  though  we  have  not 
generated  spatial  features  in  the  two  data  sets  by  independently  segmenting  them.  Here, 
we  extend  the  concept  of  feature  to  include  a multi-channel  data  vector,  but  the  vector 
retains  the  same  object  space  relationship  as  a single  pixel  in  the  PORI  images.  We  also 
investigated  the  use  of  two  decision-level  fusion  techniques:  an  expert  system,  and 
Dempster-Shafer  evidential  theory.  This  work  was  also  performed  at  the  spatial 
resolution  of  the  PORI  data,  and  was  not  based  on  spatial  segmentations. 

In  our  research  on  the  use  of  the  maximum  likelihood  classification  method  we 
generated  a multi-dimensional  data  vector  by  combining  the  red,  green,  and  blue  channels 
of  the  relatively  high  resolution  digital  color  images  with  the  near  infrared  (NIR) 
intensity  data  and  height  data  produced  by  the  laser  system.  We  used  the  MLC  technique 
to  produce  eight  land-cover  classes:  buildings,  trees,  grass,  roads,  sidewalks,  bricks,  cars, 
and  metal.  We  initially  performed  this  work  on  the  two  data  sets  independently  (RGB  and 
HI),  and  then  on  the  combined  data  in  various  combinations  of  the  five  channels.  The 
accuracy  assessments  of  the  two  independent  data  sets  revealed  that  each  contains  useful 
data  to  classify  different  objects.  For  example,  concrete  buildings  and  bricks  are  well- 
determined  on  the  RGB  image,  and  grass  and  asphalt  roads  on  the  HI  image.  To  share 
this  strengths,  we  combined  the  five  available  channels  in  six  combinations:  red,  green, 
and  intensity  (RGI),  red,  green,  and  height  (RGH),  red,  height,  and  intensity  (RHI),  red, 
green,  height,  and  intensity  (RGHI),  red,  blue,  height,  and  intensity  (RBHI),  and  red. 
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green,  blue,  height,  and  intensity  (RGBHI).  We  conducted  accuracy  assessments  on  the 
classification  maps  using  a common  set  of  test  areas.  We  found  the  RBHI  image  had  the 
highest  accuracy  of  91 .32%,  but  no  band  combination  image  classified  cars  and  metal 
classes  with  acceptable  accuracies  in  the  maximum  likelihood  classification  (see  Table  5- 
1).  We  found  it  interesting  that  the  highest  overall  accuracy  was  not  produced  using  the 
combination  of  all  data  channels  (RGBHI).  In  our  data,  this  occurred  because  several 
object  space  features  were  not  easily  separated  in  the  green  channel.  This  result  may 
indicate  that  a weighted  approach  should  be  investigated. 

Using  a few  simple  production  rules,  we  implemented  an  expert  system  for  the 
extraction  of  information  from  the  five  channels  of  data.  We  set  two  levels  for 
classification  and  fusion  research  using  the  expert  system.  Level- 1 is  for  classification  for 
each  source  of  image  data  using  production  rules  designed  only  for  that  type  of  data,  and 
Level-2  is  for  the  fusion  of  data  by  combining  the  results  from  each  Level- 1 
classification.  The  Level-2  classification  image  has  92.7362%  overall  accuracy,  and 
individual  class  accuracies  ranged  from  68.35%  of  brick  to  98.46%  of  tree  (see  Table  5- 
1). 

As  a second  experiment  in  decision-level  data  fusion,  we  investigated  the  use  of 
Dempster-Shafer  evidential  theory.  The  key  problem  in  applying  this  approach  is  the 
acquisition  of  prior  probabilities  of  ground  object  categories  for  each  of  the  two  data  sets. 
In  our  research,  we  generated  prior  probabilities  using  maximum  likelihood 
classification.  Specifically,  we  generated  a mean  and  standard  deviation  for  each  class 
using  supervised  classification,  and  then  generated  probability  maps  for  each  class  using 
the  MLC  in  a commercial  software  package  (ENVI).  Using  these  probability  maps  as  a 
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priori  probabilities,  we  generated  the  mass  assignments  necessary  in  the  Dempster  - 
Shafer  approach.  In  the  accuracy  assessment  of  classification  image  by  the  Dempster- 
Shafer  evidential  theory,  the  overall  accuracy  is  94.2381%,  and  all  ground  object  classes, 
except  metal,  had  a high  percentage  of  producer’s  accuracy.  This  image  also  had  high 
user’s  accuracies  for  all  classes  (see  Table  5-1). 

To  compare  the  results  of  our  three  experiments:  the  highest  overall  accuracy  of  the 
supervised  statistical  pattern  recognition  technique,  using  maximum  likelihood  algorithm, 
is  91.32%  (RBHI);  The  overall  accuracy  of  rule-based  classification  is  92.74%;  and  the 
overall  accuracy  of  Dempster-Shafer  evidential  theory  is  94.24%.  In  our  experiments, 
the  Dempster-Shafer  data  fusion  technique  provides  the  most  detailed  classification  and 
the  highest  overall  accuracy.  However,  the  Dempster-Shafer  classification  did  not 
produce  the  highest  accuracy  in  all  classes.  The  comparisons  of  each  class  and  overall 
accuracy  among  three  techniques  are  shown  in  Table  5-1. 

These  simple  comparisons  of  overall  accuracies  give  a general  statement  as  to  the 
success  of  the  three  fusion  experiments,  but  they  are  not  sufficient  to  explain  the 


Table  5-1.  The  Comparison  of  Each  Class  and  Overall  Accuracy  among  Three  Data 
Fusion  Techniques  (%) 


Maximum  likelihood 
classification  (RBHf) 

Expert  system 

Dempster-Shafer 
evidential  theory 

Building 

88.87 

90.65 

95.16 

Concrete  Sidewalk 

97.73 

97.62 

93.98 

Tree 

98.22 

98.86 

96.45 

Grass 

98.56 

92.22 

94.96 

Asphalt  Road 

100.0 

90.79 

99.67 

Median  Brick 

84.81 

68.35 

96.20 

Metal  on  the  roof 

47.06 

76.47 

68.63 

Car 

59.09 

87.88 

84.34 

Overall  accuracy 

91.32 

92.74 

94.24 
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differences  among  them.  To  explore  this,  we  compared  each  class  produced  in  the  three 
experiments  by  differencing  them. 

In  the  building  class,  the  maximum  likelihood  and  rule-based  approaches  both 
classified  some  part  of  buildings  as  trees.  The  dark  blue  tiles  on  the  top  of  the  building 
were  classified  as  trees  in  the  rule-based  approach.  The  building  class  was  classified  most 
correctly  in  the  Dempster-Shafer  approach. 

In  the  sidewalk  and  tree  classes,  the  three  investigated  techniques  have  same 
misclassification.  They  classify  some  concrete  sidewalk  under  tree  as  buildings.  This 
misclassification  originates  in  the  laser  data.  The  laser  pulses  from  ALSM  system  may  hit 
branches  of  tree  over  the  concrete  sidewalk  and  the  pixels  of  braches  can  classify  as  tall 
objects  or  tree  in  the  HI  image,  but  the  same  pixels  on  the  RGB  image  can  be  a concrete 
sidewalk.  A similar  effect  can  occur  if  we  use  DEM  data  instead  of  individual  laser 
measurements  of  height.  Techniques  for  the  generation  of  DEMs  require  spatial 
interpolation  of  the  height  data  to  produce  a regularly-sampled  grid. 

In  the  grass  class,  the  maximum  likelihood  classification  produced  the  best  result. 
However,  some  grass  areas  are  bare  ground  without  actual  grasses  and  this  area  cannot  be 
classified  as  exactly  grass.  The  maximum  likelihood  classification  classified  this  area  as 
the  grass  class,  but  the  rule-based  classification  and  the  Dempster-Shafer  evidential 
theory  classified  this  as  non-grass,  such  as  the  concrete  sidewalk  class. 

In  this  research,  we  show  that  the  fusion  of  digital  color  image,  DEM,  and  intensity 
data  is  a powerful  tool  for  the  image  analyst  wishing  to  take  advantage  of  object  space 
classification.  For  example,  the  areas  of  vegetation  can  be  very  well  distinguished 
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because  of  its  relatively  high  reflectivity  in  the  infrared  intensity  band.  With  height 
information,  trees  are  clearly  distinguished  from  grass,  and  building  from  sidewalk. 

The  result  shows  that  the  fusion  of  data  from  different  sources  can  share  redundant 
and  complementary  information,  and  also  provide  greater  classification  detail  and 
accuracy.  This  research  is  the  starting  point  to  the  ultimate  goal  in  mapping  science, 
which  is  to  completely  describe  the  object  space  with  regard  to  the  type  and  location  of 
the  individual  objects  within  it,  and  to  represent  this  understanding  in  an  iconic  form  a 
map. 


CHAPTER  6 

DISCUSSION  AND  RECOMMENDATION  FOR  FURTHER  WORK 

Among  several  trial  band  combinations  of  the  RGB  color  image  and  HI  image,  the 
RBHI  band  combination  in  the  maximum  likelihood  classification  has  the  highest  overall 
accuracy.  After  adding  one  more  band,  green  band,  the  accuracies  of  some  classes  were 
decreased,  and  it  caused  a decrease  to  the  overall  accuracy  of  classification  in  RGBHI 
bands  combination.  The  simple  assumption  of  data  fusion  is  that  more  data  would 
provide  better  accuracy  of  identification.  However,  the  result  from  the  RGBHI  band 
combination  is  against  the  assumption. 

In  the  green  channel,  the  grass,  brick,  and  road  classes  are  overlapped  closely,  and 
they  are  not  easily  separable.  The  sidewalk,  building,  metal  and  car  classes  exhibit  similar 
overlaps.  Consequently,  in  the  RGBHI  combination,  the  accuracies  of  the  road  and 
sidewalk  classes  were  decreased  (when  compared  to  the  RBHI  combination).  This  result 
illustrates  the  danger  of  assuming  the  optimal  classification  is  achieved  by  simply 
extending  the  data  vector  to  include  all  possible  channels  of  data.  Yet,  intuitively  we 
think  we  could  benefit  from  the  exploitation  of  all  the  data.  This  indicates  a possible  area 
for  future  work  based  on  a weighted  fusion  of  the  data.  It  may  be  possible,  for  example, 
to  assign  weights  which  are  inversely  proportional  to  the  variance  in  data  for  each 
channel,  for  each  class. 

The  classification  using  the  expert  system  technique  produced  excellent  accuracy  for 
tested  ground  objects  and  has  been  shown  to  be  an  effective  approach  in  certain 
circumstances.  Production  rules  are  easy  to  make  and  apply,  but  the  key  problem  is  how 
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to  make  reasonable  rules,  and  to  generate  a reasonable  number  of  them.  Since  this 
technique  depends  on  the  rules  that  can  be  created  by  users  or  operators,  the  initial  data 
set  should  be  carefully  analyzed  to  find  applicable  rules.  It  may  take  another  step  to 
understand  the  data  set  to  be  used,  and  to  analyze  the  geometric  characteristics  of  the 
study  areas.  Every  rule  is  suitable  for  specific  data  sets  and  study  areas.  When  the  data  set 
to  be  used  or  the  study  areas  are  changed,  the  new  rules  should  be  created  case  by  case. 
Clearly,  the  disadvantage  to  this  approach  lies  in  the  difficulty  of  generating  a set  of  rules 
that  are  sufficiently  general  to  be  used  in  a number  of  project  areas. 

In  Dempster-Shafer  evidential  theory,  we  require  prior  probabilities  of  ground  object 
categories  for  each  sensor.  In  this  work,  we  generated  them  using  a maximum  likelihood 
classification.  This  approach  may  be  good  for  small  bands  multispectral  image,  but  we 
cannot  be  sure  this  is  the  best  approach  for  higher  dimensional  data  sets,  specifically 
hyperspectral  data.  For  further  research,  several  spectral  analysis  methods  should  be 
investigated  to  generate  the  required  a priori  information.  The  spectral  angle  mapper 
(SAM)  or  spectral  unmixing  method  may  be  useful  approaches  for  accomplishing  this 
with  hyperspectral  data. 

In  the  processing  of  ALSM  data,  ground  object  removal  is  an  area  of  active  research. 
This  research  may  have  significant  value  in  that  area.  Through  the  application  of 
classification  maps,  it  may  be  possible  to  develop  new  approaches  to  filtering  which 
remove  only  one  class.  For  example,  one  could  used  our  classifications  to  remove  the 
trees  but  leave  the  buildings.  The  each  class  image  from  final  classification  can  show  one 
object  only.  We  may  be  able  to  also  generate  new  types  of  products.  We  can  overlay 
certain  object  class  with  its  elevation  data  on  the  GIS  base  image  map.  For  example,  the 
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overlaid  tree  class  on  the  base  map  can  be  used  for  the  estimation  and  management  of 
trees  on  the  city. 

Multi-spectral  data  or  hyperspectral  data,  instead  of  the  natural  color  image  and  the 
low  resolution  infrared  intensity  image,  can  be  alternative  data  sources  for  fusion  with  the 
data  from  ALSM  system  to  extract  more  detailed  information  using  the  tested  algorithms 
in  this  research. 

UF  researchers  are  investigating  the  use  of  hyperspectral  data  in  the  classification 
scheme,  with  the  goal  of  extracting  information  from  data  using  two  decision-level  data 
fusion  in  airport  area. 


APPENDIX  A 

DESCRIPTION  OF  THE  DATA  ACQUISITION  SYSTEM 

This  appendix  describes  the  system  that  can  produce  intensity,  DEM,  and  digital 
photographs  at  the  same  time.  The  system  is  comprised  of  an  integrated  GPS/INS, 
Airborne  Laser  Swath  Mapping  (ALSM)  System  using  an  Optech  Inc.  model  ALTM 
1210  laser  mapping  system,  and  Airborne  Digital  Photography  (ADP)  system  using  a 
high  optical  resolution  Kodak  420  color  digital  camera 

System  Components 

The  integrated  system  produces  the  following  types  of  data: 

• Three-dimensional  measurements  of  laser  intensity,  which  are  proportional  to 
reflectivity  at  the  laser  wavelength  and  accurate  elevation  data 

• Airborne  Laser  Swath  Mapping  (ALSM)  System 

• Two-dimensional  measurements  of  RGB  digital  numbers,  which  are  proportional 
to  object-space  radiance 

• Airborne  Digital  Photography  (ADP)  system 

The  University  of  Florida  system  is  mounted  in  a Cessna  337  in-line  twin-engine  aircraft, 
equipped  with  a Starlink  real-time  GPS  navigation  unit,  and  a L1/L2  microstrip  antenna 
for  geodetic  quality  GPS  phase  difference  observations. 

Airborne  Laser  Swath  Mapping  System 

The  ALSM  is  a scanning  laser  ranging  system  that  can  be  used  to  produce  accurate 
high-resolution  topographic  digital  maps.  A pulsed  laser  ranging  system  is  mounted  in 
the  aircraft  equipped  with  a precise  kinematic  Global  Positioning  System  (GPS)  receiver 
and  an  inertial  measurement  unit  (IMU).  By  accurately  timing  the  round-trip  travel  time 
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of  the  light  pulses  from  the  aircraft  to  the  ground  (water,  foliage,  buildings,  or  other 
surface  features)  it  is  possible  to  determine  the  range  with  a precision  of  one  centimeter 
or  better.  When  such  a laser  ranging  system  is  mounted  in  an  aircraft  with  the  scan  line 
perpendicular  to  the  direction  of  flight,  it  produces  a saw-toothed  pattern  of  ranges  within 
a strip  centered  directly  along  the  flight  path  (see  Figure  A-1). 

The  size,  weight,  and  power  requirements  of  the  earlier  systems  made  it  necessary  to 
operate  from  a costly  four-engine  aircraft,  and  the  technique  was  too  expensive  for  many 
applications  (Guenther  1985;  Krabill  et  al.,  1995a,b).  However,  recent  advances  in  laser 
hardware  design  have  enabled  the  development  of  high-speed,  solid-state  lasers  that  are 
lightweight  and  compact  enough  to  fit  into  small  aircraft,  either  fixed-wing  or  helicopters. 
The  commercial  application  for  topographic  maps  has  developed  within  the  last  five 
years.  A scanning  mirror  within  the  laser  sensor  unit  scans  at  a programmable  rate  to 
direct  the  laser  pulse  in  a pre-designated  swath  pattern.  All  variables  associated  with  the 
attitude  of  the  aircraft--pitch,  roll  and  heading— are  monitored  and  recorded. 

The  system  consists  of  a high-accuracy  laser  rangefinder  and  a programmable 
precision  scanner  that  work  in  tandem  with  a high-accuracy  IMU  and  GPS  to  locate  and 
record  the  absolute  position  and  intensity  of  targets  below.  Targets  may  be  open  ground, 
tree  tops,  power  wires,  hydroelectric  towers,  buildings,  roads,  bridges,  and  so  forth. 

As  the  aircraft  flies  over  the  area  to  be  surveyed,  laser  pulses  are  emitted  at  a fixed 
pulse  repetition  frequency  (PRF).  The  laser  pulses  are  time-stamped  according  to  GPS 
time  and  thereby  correlated  to  GPS  position  data.  The  speed  with  which  data  are 
collected  by  an  airborne  scanning  laser  also  sets  this  method  apart  from  other  remote 
sensing  technologies.  The  primary  advantages  of  ALSM  compared  to  traditional  ground 
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survey  and  photogrammetric  techniques  are  the  speed  at  which  the  survey  can  be 
completed  and  the  results  delivered,  the  cost  per  unit  area,  the  accuracy,  and  the  ability  to 
complete  the  survey  under  a wide  variety  of  lighting  conditions,  including  all  seasons  and 
at  night. 

The  University  of  Florida  ALSM  system,  as  used  during  our  experiments,  is  Optech 
Model  1210.  We  show  the  system  in  Figure  A-2  (a),  and  summarize  the  technical 
specifications  in  Table  A-1.  University  of  Florida  researchers  have  performed  a number 
of  projects  during  the  past  four  years  to  test  and  demonstrate  the  capabilities  of  ALSM 
techniques,  to  improve  the  data  reduction  and  analysis  procedures,  and  to  explore  the 
most  useful  way  to  provide  the  results  to  users.  The  projects  have  included  mapping  of 
sandy  beaches  subject  to  erosion  and  tropical  storm  damage,  environmentally  sensitive 
marshlands,  and  measurement  of  obstruction  to  the  airspace  at  airports  (Carter  & 
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Figure  A-1.  Schematic  of  Airborne  Laser  Swath  Mapping  technology. 
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Table  A-1.  Specifications  of  the  Optech  Model  ALTM  1210 


ALSM  System  (ALTM  1210) 

Operating  altitude: 

330  - 2000  m 

Range  accuracy: 

10  cm  single-shot 

Range  resolution: 

1 cm 

Relative  accuracy 

A-4  cm  @2khz,  5-10  cm  @10  khz 

Scan  angle: 

Variable  from  0 to  ±20 

Options 

Intensity  data 

Simultaneous  first  and  last  pulse  measurements 

Extended  altitude  (up  to  2000m)  operation 

Swath  width: 

Variable  from  0 to  0.68  x altitude 

Angle  accuracy: 

0.05 

Angle  resolution: 

0.01 

Scan  frequency: 

Variable;  depends  on  scan  angle;  For  example,,  30  Hz  for 

±20  scan,  50  Hz  for  ±10  scan 

Roll  and  pitch 
accuracy: 

0.04 

Heading  Accuracy: 

0.05 

Supported  GPS 
receivers: 

Ashtech  Z12  or  Trimble  4000SSE 

Laser  wavelength: 

1 064  nm 

Laser  repetition  rate: 

100  Hz  to  10kHz 

Beam  divergence: 

0.30  mrad 

Laser  classification 

Class  IV  laser  product  (FDA  CFR  21) 

Eyesafe  range: 

308  m (single  shot) 

Power  requirements: 

28  VDC  @ 30  A 

Operating  temp.: 

10-35C 

Humidity: 

0 - 95%  noncondensing 

Sensor: 

Fits  all  existing  camera  mounts  or  can  be  directly  mounted 
to  the  floor  in  a twin-engine  aircraft  such  as  Cessna  337 

Control  rack: 

1 stackable  vibration-isolated  transportable  case 

Dimensions: 

60  X 60  X 75  cm,  excluding  GPS 

Weight: 

50  kg  including  shipping  covers  and  cables 

Video  output: 

NTSC  or  PAL  (annotated  video  out) 

Data  storage: 

12-hour  capacity  (8mm  digital  data  tape) 
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Shrestha,  1997;  Carter  et.  al.,  1998;  Shrestha  & Carter,  1998).  The  results  of  these 
projects  indicate  that  the  system  can  yield  heights  accurate  from  5 to  10  centimeters,  and 
detect  linear  topographic  features  with  height  differences  from  2 to  5 centimeters  (Carter, 
1998;  Tuell  et.  al.,  2000). 

The  average  of  the  differenees  in  orthometric  heights  from  the  low  altitude 
photogrammetric  method  and  the  ALSM  method  show  that  the  ALSM  technology 
provides  orthometric  heights  as  worthy  as  those  derived  from  the  photogrammetric 
method.  Fast  and  economic  data  collection  at  high  density  X,  Y,  Z points  on  the  ground 
with  a single-engine  aircraft  makes  the  ALSM  system  one  of  the  most  promising  and 
ideal  technologies  for  highway  application,  as  well  as  other  mapping  applications,  in 
decades  and  therefore  it  may  revolutionize  the  mapping  technology  in  the  future 
(Shrestha  et.  al.,  1997;  Shrestha  et.  al.,  1999). 

Airborne  Digital  Photography  Imaging  System 

Digital  cameras  have  also  evolved  rapidly.  These  systems  have  reached  a performance 
level  whereby  they  can  be  integrated  into  airborne  LIDAR  system,  to  provide  the 
necessary  visual  coverage  of  the  area.  There  is  a high  demand  for  high  resolution  ortho- 
images for  large-scale  mapping  applications.  In  addition,  the  need  for  using  digital  color 
images  as  an  additional  data  source  to  improve  various  applications,  such  as  feature 
extraction  or  classification,  is  also  increasing. 

The  University  of  Florida  Airborne  Digital  Photography  (ADP)  was  installed 
alongside  the  ALSM  system  in-line  in  a twin-engine  aircraft.  It  produces  digital  eolor 
images  using  a Kodak  DCS  420  digital  camera  that  has  a single  CCD  array.  The  Bayer 
mosaic  pattern  of  pixel  filters  are  placed  on  the  face  of  the  CCD  array  to  filter  light  so 
that  only  red,  green,  or  blue  light  reaches  any  given  pixel.  Since  the  Bayer  mosaic  pattern 
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of  pixel  filters  are  eonsisting  of  rows  of  red-green-red-green. . . and  blue-green-blue- 
green.  . . pixels,  only  25%  of  the  pixels  are  devoted  to  blue,  25%  to  red,  and  50%  to  green. 
This  effect  make  green  has  greater  spatial  resolution  than  blue  or  red.  The  digital  camera 
CCD  array  is  rectangular,  with  approximately  1524  by  1012  elements,  for  a total  of  1.6 
million  elements  and  the  principal  point  is  at  (X:  775,  Y:  512).  At  the  typical  flying 
height  of  600  meters  and  a 28.45mm  focal  length  lens,  each  9 micrometer  pixel 
corresponds  to  approximately  a 20cm  of  ground.  Figure  A-2  (b)  shows  the  ADP  system 
at  the  University  of  Florida. 

The  integrated  ALSM/ADP  system  can  simultaneously  collect  ALSM  data  and  ADP 
digital  photographs. 


(a)  (b) 

Figure  A-2.  The  University  of  Florida  System:  (a)  the  ALSM  system;  (b)  the  ADP 
system. 


APPENDIX  B 

DIRECT  DIGITAL  IMAGE  GEOREFERENCING 
This  appendix  describes  the  direct  image  georeferencing  and  back  projection  as  the 
preprocessing  step  for  this  research,  and  recent  demands  for  georeferenced  digital  color 
image  as  the  GIS  base  map. 

Airborne  GPS  Positioning  and  Georeferencing 
The  introduction  of  GPS  technology  in  the  late  1980s  (the  system  became  fully 
operational  in  1993)  together  with  the  advances  in  computational  algorithms  resulted  in 
the  development  of  DGPS-supported  aerotriangulation. 

Since  the  mid-1980s,  airborne  GPS  positioning  experiments  have  been  conducted  for 
either  scientific  or  engineering  applications.  Krabill  and  Martin  (1987)  compared  the  GPS 
vertical  trajectory  to  airborne  LIDAR  measured  altitude  data  from  the  NASA  Airborne 
Oceanographic  LIDAR  (AOL).  The  two  data  sets  were  compared  with  a relative  accuracy 
of  12  centimeters  RMS  under  poor  GPS  satellite  geometry  for  vertical  positioning. 
Analysis  indicates  that  1 to  2 centimeters  relative  vertical  positioning  is  achievable  with 
carrier  phase  tracking  receivers  and  good  GPS  geometry.  Mader  (1986)  also  described 
the  use  of  carrier  phase  measurement  to  position  an  aircraft  in  flight  and  compared  the 
vertical  profiles  of  the  aircraft,  as  determined  by  GPS  and  a laser  altimeter  (Cannon  & 

Shi,  1995). 

Mader  and  Lucas  (1989)  described  an  experiment  in  which  photogrammetry  was  used 
to  obtain  independent  three-dimensional  estimates  of  position  for  an  aircraft  in  flight 
allowing  a more  precise  evaluation  of  the  accuracy  of  the  GPS  positions.  One  important 
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application  of  this  capability  was  examined  by  Lucas  (1987),  who  showed  that  if  the 
positions  of  a photogrammetric  camera  can  be  independently  determined  to  an  accuracy 
of  5cm,  then  comparable  accuracies  may  be  obtained  for  points  on  the  ground  with  little 
or  no  ground  control  required.  Other  reported  results  have  demonstrated  that  the 
positional  accuracy  of  the  aircraft— with  respect  to  the  ground  monitor  receiver— is  at  the 
decimeter  level  (Keel  et  al.,  1989;  Cannon  et  al.,  1990,  1992),  when  using  double 
difference  carrier  phase  measurements  (Wells  et  al.,  1986). 

Most  of  these  experiments  were  carried  out  over  small  test  areas  of  several  kilometers 
to  about  30km.  Under  such  circumstances,  many  errors  tend  to  cancel  out,  which  leads  to 
a high  achievable  accuracy.  However,  with  monitor-remote  separations  increasing  to 
several  hundreds  of  kilometer,  many  errors,  such  as  ionospheric  and  orbital  effects, 
become  the  limiting  factors  in  the  error  budget.  Based  on  experiments  of  a land  vehicle 
moving  at  high  speeds  with  a monitor-remote  separation  of  65km,  Henderson  and  Leach 
(1990)  reported  that  a positioning  accuracy  better  than  25cm  can  be  achieved.  Colombo 
(1991)  demonstrated,  according  to  simulation  results,  that  airborne  carrier  phase-based 
GPS  positioning  is  possible  at  the  decimeter  level  over  very  long  distances  (monitor- 
remote  separations  of  1300km)  when  properly  accounting  for  errors.  Caimon  et.  al. 

(1995)  reported  the  accuracy  of  airborne  carrier  phase-based  GPS  positioning  with 
monitor-remote  separations  in  the  range  of  50  to  200km  at  the  level  of  10cm  using  high 
quality  receivers  and  reliable  ambiguity  initialization.  Morrison  (2001)  tested  long  base 
line  with  airborne  laser  data. 

The  economic  benefit  of  precise  kinematic  GPS  positioning  for  mapping  and  other 
photogrammetric  applications  is  clear.  However,  the  importance  of  kinematic  GPS 
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positioning  is  not  limited  to  photogrammetry.  Other  remote  sensors  of  geophysical 
interest,  such  as  airborne  and  satellite-borne  altimeters  and  gravimeters,  will  benefit  fi'om 
such  precise  GPS  kinematic  positions  (Mader  & Lucas,  1989).  For  precise  position 
determination,  differential  GPS  is  clearly  the  primary  method.  The  development  of  new 
GPS  technology,  which  provides  dual  frequency  carrier  phase  data  (for  example,  P 
codeless  receivers),  increases  the  performance  of  GPS  for  airborne  surveys.  This 
performance  was  improved  by  the  ability  to  correct  for  effects  from  the  ionosphere  and  to 
use  widelaning  (for  example,  Abidin,  1990).  These  two  factors  improve  the  reliability  of 
the  results  and  extend  the  range  at  which  high  accuracy  positions  can  be  achieved.  In  this 
already  well-established  industry  method,  the  DGPS  data  provide  partial  knowledge 
about  the  origin  of  the  image  in  space-time,  particularly  on  its  position.  By  coupling  this 
information  with  the  concept  of  overlapping  imagery  and  at  least  three  ground  control 
points  for  each  block  of  images,  the  remaining  parameters  of  exterior  orientation  can  be 
found  and  images  can  be  georeferenced. 

Although  the  introduction  of  DGPS  to  the  georeferencing  problem  has  not  completely 
eliminated  the  need  for  ground  control  and  overlapping  imagery,  it  has  pointed  the  way  to 
the  integration  of  DGPS  and  the  already  existing  inertial  technology. 

During  the  last  few  years,  GPS  multianteima  systems  have  been  studied  as  an 
alternative  for  attitude  determination.  In  this  case,  changes  in  the  orientation  matrix  are 
directly  determined  from  changes  in  the  GPS  data  between  a fixed  eluster  of  GPS 
antennae  eonnected  to  a common  receiver  [for  details,  see  Cohen  &Parkinson  (1991)]. 
The  accuracy  of  orientation  determination  is  largely  dependent  on  the  distance  between 
antennae  and  on  the  magnitude  of  multipath  in  the  measurements.  Comparable  aecuracies 
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have  been  reported  by  Lu  et.  al.  (1993).  In  most  cases,  the  remote  sensing  applications 
would  require  higher  short-term  orientation  accuracy  than  that  provided  by  GPS  multi- 
antenna systems.  They  may  be  useful,  however  on  very  long  missions  to  bound  gyro  drift 
in  inertial  systems. 

When  integrated  with  GPS  and  aerial  camera  systems,  the  inertial  measurement  unit 
(IMU)  measures  the  camera’s  six  exterior  orientation  parameters  to  an  accuracy  that 
enables  the  elimination  of  the  process  of  aerial  triangulation  for  a variety  of  mapping 
scales  (Abdullah  & Tuttle,  1997). 

With  advances  in  satellite  and  inertial  (IMU/GPS)  georeferencing  techniques  and  the 
ready  availability  of  digital  image  (CCD)  sensors,  a considerable  portion  of  GIS 
information  can  be  acquired  from  moving  platforms  operating  on  land,  water,  or  in  the 
air.  The  land-vehicle-based  acquisition  systems  are  usually  capable  of  delivering 
coordinates  in  object  space  with  a typical  absolute  accuracy  of  0.1  to  2 meters  [see,  for 
instance.  Ash  et  al.  (1994),  Bossier  et  al.  (1993),  and  El-Sheimy  (1996)].  However,  the 
carriers  that  are  in  highest  demand  by  industry  for  semiautomatic  mapping  are  aircraft 
and  helicopter.  To  design  an  airborne  survey  system  at  a reasonable  cost  and  with  an 
accuracy  of  a few  decimeters,  the  method  of  direct  georeferencing  needs  further 
improvements. 

Schwarz  et  al.  (1993)  presented  a general  model  for  the  georeferencing  of  remotely 
sensed  data  by  an  onboard  positioning  and  orientation  system  as  a problem  of  rigid  body 
motion.  The  determination  of  the  six  independent  parameters  of  motion  by  discrete 
measurements  from  inertial  and  satellite  systems  is  directly  related  to  the  problem  of 
exterior  orientation.  The  airborne  and  the  photogrammetric  mapping  industry  gave 
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serious  consideration  to  the  use  of  the  integrated  GPS/inertial  technology  to  measure 
camera  attitude  to  an  accuracy  that  enables  the  process  of  photogrammetric  mapping 
without  the  determination  of  exterior  orientation  parameters  from  aerial  triangulation  or 
other  photogrammetric  means.  This  requirement  has  been  recognized  by  different 
academic  and  commercial  institutions  around  the  world,  and  a number  of  independent 
experiments  have  been  carried  out  [see,  for  instance,  Skaloud  et  al.  (1996),  Hutton  et  al. 
(1997),  Reid  et  al.  (1998),  Grejner-Brzezinska  & Toth  (1998),  and  Cramer  & Haala 
(1998)]  and  first  results  with  commercially  available  systems  appeared  (Abdullah  & 
Tuttle,  1997). 

Georeferencing  of  Image  Sensors 
Direct  Georeferencing  Method 

The  concept  of  georeferencing  of  image  sensors  by  IMU/GPS  is  not  complex:  for  each 
captured  image,  determine  orientation  (co,  (j),  k)  and  position  (Xo,  Yq,  Zq)  of  the  principal 
point  at  the  moment  of  exposure  so  that  it  can  be  directly  used  in  a chosen  mapping 
frame.  An  IMU/GPS  system  can  provide  this  information  with  a quality  that  depends  on 
the  navigation  accuracy  (Xins,  Yins,  Zms,  roll,  pitch,  azimuth)  and  on  the  accuracy  of 
calibration  parameters  relating  camera  and  IMU  body  frames.  The  processing  chain, 
which  contributes  to  the  overall  accuracy  of  an  acquisition  system,  is  affected  by  the 
accuracy  of  the  measured  image  data,  IMU/GPS  position  and  attitude,  system  calibration, 
optical  properties  of  the  cameras,  and  the  effect  of  image  geometry. 

In  the  data  fusion  concept,  the  direct  georeferencing  method  can  be  explained  in  data- 
level  fusion.  Figure  B-1  shows  data-level  fusion  provided  by  Tuell  et.  al.  (2001). 

If  directly  measured  orientation  elements  are  utilized  for  sensor  orientation,  the 
mathematical  correction  must  be  adopted  for  this  application.  Since  the  orientation 
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Figure  B-1.  Data-level  fusion  (Tuell,  et.  al.,  2000). 

sensors  are  physically  displaced  from  the  sensor  to  be  oriented,  additional  correction 
terms  are  introduced  [For  example,  Skaloud  et  al.  (1996)].  Assuming  an  integrated 
GPS/inertial  system  in  combination  with  an  imaging  sensor,  the  physical  shifts  between 
the  inertial  system  and  GPS-antenna  on  the  one  hand  and  the  perspective  center  of  the 
camera  on  the  other  hand  are  corrected  by  lever  arms  defined  in  the  local  aircraft  body 
frame.  For  each  system  installation,  these  specific  lever  arms  have  to  be  determined  using 
conventional  terrestrial  survey  methods.  The  attitudes  provided  by  the  integrated 
GPS/inertial  system  are  related  on  the  inertial  body  frame  coordinate  axes.  Thus  an 
additional  misalignment  matrix  has  to  be  taken  into  account  to  transfer  the  measured 
attitudes  to  the  imaging  sensor  frame.  Since  the  misalignment  angles  between  the  IMU 
and  camera  frame  are  not  directly  observable  via  conventional  techniques,  they  have  to 
be  determined  indirectly  in  an  appropriate  calibration  procedure.  This  attitude  transfer  is 
a demanding  task  because  reference  orientations  of  superior  accuracy  are  necessary  for 
precise  alignment. 

Although  traditional  aerial  triangulation  provides  independent  attitude  information 
with  high  theoretical  accuracy,  the  estimated  values  are  affected  by  remaining  systematic 
errors,  and  they  do  not  agree  with  the  true  physical  orientation.  Nevertheless, 
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photogrammetry  provides  the  only  method  for  determining  the  misalignment  angles  in  a 
kinematic  airborne  environment.  The  attitude  differences  between  the  exterior 
orientations  estimated  from  AT  and  GPS/inertial  techniques  must  be  compared  to 
determine  the  misorientation  calibration.  The  quality  of  the  misalignment  calibration  is 
strongly  dependent  on  the  budget  of  nonmodeled  systematic  errors  in  the  bundle 
adjustment. 

The  calibrated  misalignment  angles  should  remain  constant  as  far  as  there  are  no 
relative  movements  between  the  two  sensor  components.  After  correcting  the 
GPS/inertial  exterior  orientations  by  the  positional  offsets  and  the  misalignment  angles, 
the  reduced  orientations  are  interpolated  at  the  exposure  times  of  the  imaging  sensor  to 
overcome  the  time  offset  between  the  different  sensors. 

Principles  of  Direct  Image  Georeferencing 

Despite  the  direct  measurement  of  the  exterior  orientation  {Xl,  Yl,  Zi,  co,  (4  ic)  of  an 
imaging  sensor  using  an  integrated  system  consisting  of  receivers  of  the  global 
positioning  system  (GPS)  and  inertial  technology,  it  is  still  necessary  for  image 
georeferencing  to  collect  ground  control  points  (GCPs)  which  have  accurate  elevation.  If 
it  is  possible  to  collect  exterior  orientation  parameters  and  GCPs  for  images 
simultaneously,  an  automatic  image  georeferencing  system  can  easily  be  constructed.  It 
uses  the  collinearity  equations  that  are  expressed  in  the  condition  in  which  the  exposure 
station  of  any  photograph,  an  object  point,  and  its  photo  image  all  lie  on  a straight  line. 

They  are  perhaps  the  most  useful  to  the  photogrammetrist.  In  Figure  B-2,  exposure 
station  L of  an  aerial  photo  has  coordinates  Xl,  Yl,  and  Zl,  with  respect  to  the  object 
(ground)  coordinate  system  XYZ.  Image  a of  object  point  A,  shown  in  a rotated  image 
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plane,  has  image  space  coordinates  x ’a,  y ’a,  and  z 'a,  where  the  rotated  image  space 
coordinate  system  a: z ' is  parallel  to  the  object-space  coordinate  system  XYZ  (Wolf, 
1983). 

The  University  of  Florida  digital  image  system  consists  of  a digital  camera  with  1 524 
by  1012  pixels,  which,  along  with  flying  height,  defines  the  ground  pixel  resolution.  The 
ground  area  covered  by  a pixel  depends  on  its  distance  from  the  nadir,  as  the  fundamental 
image  geometry  is  perspective  in  nature.  The  forward  movement  of  the  aircraft  produces 
the  track  images  by  capturing  successive  ground  images.  Therefore,  the  aircraft  motion 
directly  affects  the  model  geometry.  It  is  thus  necessary  to  model  the  aircraft  motion  that 
took  place  during  the  image  capture.  The  time  interval  between  successive  images  is 
typically  2.6  seconds.  It  is  normally  assumed  that  the  sensor  platform  is  considered  as  a 
conventional  photograph  with  perspective  geometry.  This  means  the  mathematical  model 


Figure  B-2.  The  collinearity  condition  (Wolf,  1983). 
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to  represent  the  relationship  between  the  image  and  the  ground  is  the  known  collinearity 
equations. 

If  there  is  not  direet  measurement  of  the  position  and  attitude  of  each  image  then  a 
time  dependent  relationship  must  be  established  between  all  like  parameters.  The 
geometric  parameters  are  then  determined  by  using  ground  control  points  in  a least 
square  computation. 

The  University  of  Florida  Airborne  Laser  Swath  Mapping  (ALSM)  system  and 
Airborne  Digital  Photograph  (ADP)  systems  simultaneously  provide  all  the  necessary 
information  to  apply  collinearity  equations,  such  as  high  resolution  digital  color  images, 
direct  measurement  of  exterior  orientation  parameters  from  integrated  GPS/inertial 
navigation  system,  and  well  distributed  Ground  Control  Points  (GCPs)  which  are 
georeferenced  ALSM  laser  points. 

Positional  Offset  and  Misalignment  Angle  Calibration 

The  positions  and  orientations  from  GPS/IMU  do  not  refer  to  the  perspective  center  of 
the  imaging  sensor  directly.  Caused  by  translational  and  rotational  offsets,  the  GPS 
antenna  and  the  center  of  the  inertial  system  are  displaced  from  the  camera  and  laser 
mirror.  The  spatial  shift  between  the  sensor  components  can  usually  be  measured  using  a 
conventional  terrestrial  survey.  The  geometric  location  of  the  ALSM  system  and  digital 
camera  system  in  the  aircraft  are  shown  in  Figure  B-3.  Figure  B-4  shows  the  positional 
offset  between  GPS,  IMU,  and  imaging  sensors.  The  attitudes  from  GPS/IMU  are 
calculated  from  the  rotation  of  the  MU  body  frame  defined  by  the  MU  sensor  axes  to 
the  local  level  frame.  Since  the  physical  MU  sensor  axes  are  not  aligned  to  the  image 
coordinate  frame  (see  Figure  B-5),  the  misalignment  has  to  be  determined  in  order  to  use 
the  attitudes  from  the  MU  for  the  georeferencing  of  the  photogrammetric  image  data. 
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Figure  B-3.  University  of  Florida  ALSM  and  Digital  Camera  System  on  aircraft. 


GPS  Antenna 


Figure  B-4.  Positional  Offset  between  GPS,  IMU,  and  Imaging  Sensors  (Applanix). 
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Figure  B-5.  Reference  frame  orientation  angles  (Applanix). 


The  translational  misalignment  between  the  laser  mirror  frame  to  IMU,  the  position 
offsets  between  the  laser  mirror  and  GPS,  and  the  position  offsets  between  the  laser 
mirror  and  the  IMU  were  calibrated  by  Optech  laboratory  (see  Tables  B-1  and  B-2).  The 
integrated  GPS/Inertial  system  of  the  ALTM  1210  system  is  the  POS/AV  developed  by 
Applanix  Corp.  of  Markham,  Ontario,  Canada.  The  Posproc  software  is  used  for 
postprocessing. 

The  origin  of  the  user  reference  frame  is  the  laser  mirror.  The  position  offsets  between 
the  laser  mirror  and  the  perspective  center  of  camera  are  determined  using  conventional 
terrestrial  survey  methods  after  installation  of  the  system  in  the  aircraft  [x:  0.184  m,  y; 
0.019m  (source:  VeriMap)]  but  z offset  was  not  measured  at  the  same  time.  To  get  direct 
measurement  of  the  position  and  attitude  modeling  at  the  instant  of  image  capture,  new 
offsets  parameters,  such  as  position  offset  between  the  camera  and  the  GPS  lever  arm. 
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position  offset  between  the  camera  and  the  IMU  lever  arm,  and  misalignment  between 
the  camera  and  the  IMU,  should  be  applied.  The  x,  y and  z (approximately,  0.075m) 
offsets  between  the  laser  mirror  and  the  center  of  camera  lens  are  added  to  offsets  from 
the  laser  mirror  to  the  IMU  lever  arm  and  laser  mirror  to  the  GPS  lever  arm. 

P'raps  = P'^Laser  + P^'^GPS 

PC  — 4- 

IMU  ^ Laser  ' r IMU 

where  P'raps  is  the  position  offset  between  the  camera  and  GPS,  and  P'^Laser  is  the  position 
offset  between  the  camera  and  laser,  and  P'^^'gps is  the  position  offset  between  the  aser 
and  GPS.  Table  B-1  shows  the  calculated  position  offset  between  the  camera  and  the 
IMU  lever  arm.  Table  B-2  shows  the  position  offset  between  the  camera  and  the  GPS 
lever  arm.  The  actual  distances  from  the  GPS  to  the  image  sensors  and  from  the  IMU  to 
image  sensor  are  drawn  in  Figures  B-6  and  B-7. 

The  rotational  offsets  between  the  IMU  sensor  axes  and  the  camera  coordinate  system 
cannot  be  observed  via  conventional  survey  methods.  The  misalignment  angle 
determination  will  be  performed  after  angle  analysis. 

Table  B-1.  Positional  Offsets  Between  Sensors  to  IMU  Lever  Arm 


Laser  mirror  to  IMU 

Camera  to  IMU  Lever 

X Lever  Arm  (m) 

-0.101 

-0.285 

Y Lever  Arm  (m) 

-0.005 

-0.024 

Z Lever  Arm  (m) 

-0.111 

-0.036 

•2.  Positional  Offsets  Between  Sensors  to  GPS  Lever  Arm 

Laser  mirror  to  GPS 

Camera  to  GPS 

X Lever  Arm  (m) 

-0.024 

-0.280 

Y Lever  Arm  (m) 

0.019 

-0.080 

Z Lever  Arm  (m) 

-1.455 

-1.280 

Ill 


GPS  Antenna 


Figure  B-6.  Profile  of  fuselage  of  aircraft  that  shows  the  distances  between  reference 
lever  arms  and  sensors  in  Z direction. 


Figure  B-7.  Distances  between  reference  lever  arms  and  sensors  in  aircraft. 
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Time  Alignment 

One  main  condition  before  using  the  GPS/Inertial  orientations  for  any  sensors  to  be 
oriented  is  the  precise  time  alignment  between  the  different  sensors.  The  camera  shutter 
is  opened  approximately  every  2.6  seconds.  Each  exposure  is  marked  with  UTC  time 
automatically.  This  information  is  saved  in  a text  file  and  should  be  converted  to  GPS 
time.  After  processing  using  the  Posproc  software,  a trajectory  file  for  the  camera  lens 
(GPS  time,  Xc,  Yc,  Zc)  and  an  inertial  navigation  file  for  the  camera  lens  (GPS  time, 

CO,  (f>,  k)  are  created  and  recorded  every  0.02  seconds.  The  omega,  phi,  and  kappa  are  the 
angles  roll,  pitch,  and  yaw,  respectively,  provided  by  the  IMU.  Since  the  camera  is 
opened  every  2.6  seconds  and  output  data  from  Posproc  are  recorded  every  0.02  seconds, 
linear  interpolation  is  applied  to  get  the  position  of  the  camera  {Xc,  Yc,  Zc)  and  INS 
information  {co,  (j),  k)  at  the  same  GPS  time  as  the  camera  shutter  was  opened. 

The  ALSM  georeferenced  X,  Y,  and  Z ASCII  data  can  be  used  as  ground  control 
points  to  solve  for  the  exterior  orientation  parameters  of  each  photograph  as  a single 
photo  resection.  ALSM  laser  data  within  the  object  space  of  a digital  photo  image  were 
selected  within  +/-  3 sec  GPS  time  from  the  camera  shutter  opening  time.  After  this 
process,  the  exterior  orientation  [position  (Xc,  Yc,  Zc),  attitude  (co,  <!>,  ;c)]  at  exposure  time 
and  many  well-distributed  ground  control  points  (Xg,  Yg,  Zg)  from  ALSM  are  available  for 
use  in  the  resection  computation. 

Angle  Analysis  for  Back-Projection 

In  order  to  construct  the  collinearity  equations  from  data  provided  by  the  University  of 
Florida  (UF)  sensor  system,  we  can  start  with  a ground  point  position  (Xg,  Yg,  Zg)  in  local 
plane  coordinates,  such  as  State  Plane  or  UTM  and  the  camera  position  (Xc,  Yc,  Zc)  in  the 
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same  coordinate  system.  The  conversion  from  the  IMU  frame  to  the  camera  frame  is 
shown  in  Figure  B-8. 

In  the  IMU  frame,  Z is  downward  and  X is  forward.  In  the  UF  installation,  the  camera 
is  installed  backward.  Z is  upward,  and  90  degrees  must  be  added  to  the  yaw  angle 
because  the  IMU  uses  north  rather  than  east  as  the  positive  X-axis.  To  solve  the 
transformation  problem  from  the  IMU  frame  to  the  camera  frame,  the  x-axes  are  placed 
coincidently  with  the  y-axes  pointing  in  opposite  directions  and  the  z-axes  pointing  in 
opposite  directions.  The  camera  is  up  and  the  laser  is  down.  The  angles  provided  by  the 
Inertial  Measuring  Unit  (IMU)  are  intended  to  transform  vectors  measured  by  the  laser 
from  the  laser  coordinate  system  into  the  ground  coordinate  system.  A clockwise  rotation 
about  the  common  x-axis  (clockwise  as  viewed  from  the  positive  end  of  the  axis)  through 
the  given  roll  angle  brings  the  y-axes  of  both  the  laser  and  camera  into  a horizontal  plane. 
Typically,  this  angle  would  be  considered  a negative  rotation.  For  clarification,  the 
rotation  matrix  is  specified  in  Equation  B- 1 
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Figure  B-8.  Conversion  from  IMU  frame  to  camera  frame  for  angle  adjustment. 
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R^{-co)~  0 cos<y  -sincy 
0 sin  6)  cosfy 


(B-1) 


Next,  a clockwise  rotation  about  the  y-axis  (clockwise  as  viewed  from  the  positive  end 
of  the  y-axis  of  the  laser)  through  the  given  pitch  angle  brings  the  common  x-axis  into  the 
horizontal  plane  and  the  z-axes  of  both  into  the  vertical  direction  with  the  z-axis  of  the 
laser  pointing  down  and  the  z-axis  of  the  camera  pointing  up.  In  Equation  B-2,  this 
rotation,  as  seen  from  the  positive  y-axis  of  the  camera,  is  in  the  counterclockwise 
direction,  which  we  call  positive,  and  is  given  by  Equation  B-2. 


Finally,  a clockwise  rotation  about  the  z-axis  (clockwise  as  viewed  from  the  positive 
end  of  the  z-axis  of  the  laser)  through  the  given  yaw  angle  brings  the  common  x-axis  to 
north.  Then  a counterclockwise  rotation  about  the  same  axis  through  90  degrees  causes 
the  positive  x-axis  of  the  laser  to  point  east  and,  therefore,  the  positive  x-axis  of  the 
camera  as  well.  This  rotation  when  viewed  from  the  positive  z-axis  of  the  camera  is  a 
counterclockwise  rotation  of  yaw-90  degrees  (see  Equation  B-3). 


cos^  0 -sin^ 


R, {(!>)=  0 1 0 

sin^  0 cos^ 


(B-2) 


R^{k-—)=  - sin  (x-  - — ) cos  (/c  - — ) 0 
2 2 2 
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(B-3) 
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The  orientation  matrix  obtained  from  these  rotations  will  be 


M=RAx--)Ry{mA-oy) 

which  is  used  to  project  a distance  measured  by  the  laser  to  the  ground  using 


(B-4) 
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(B-5) 


where  the  vector  with  g subscripts  is  the  ground  point  position,  the  vector  with  the  L 
subscripts  is  the  laser  position,  and  D is  the  measured  distance. 

In  the  some  way,  we  can  compute  a ground  point  position  from  an  image  measurement 
using 


(B-6) 


where  the  vector  with  c subscripts  is  the  camera  position,  x and  y the  photo  coordinates, 
and/ the  focal  length.  This  assumes  that  we  know  the  distance  D,  or  that  we  know  the 
elevation  (z-coordinate)  of  the  ground  point  from  which  we  can  solve  for  D.  Note  that 
this  concept  in  photogrammetry  that  image  coordinates  have  a z-coordinate  of  minus/ 
accomplishes  the  direction  change  that  the  ALSM  system  achieves  by  a coordinate 
system  with  the  z-axis  pointing  down. 

The  matrix,  M,  derived  here  transforms  vectors  from  the  camera  coordinate  system 
into  the  ground  coordinate  system.  Therefore,  its  transpose  must  be  used  in  forming  the 
collinearity  equations. 


where 
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Finally,  the  photo  coordinates  can  be  calculated  with  the  familiar  collinearity 
condition  equations. 


When  the  x,  y photo  coordinates  and  their  related  ground  coordinates  are  known, 
geocoded  images  are  generated  using  commercial  software,  such  as  PCI  Geomatics  or 
ERDAS  Imagine.  It  is  then  possible  to  make  image  mosaicing  and  orthorectified  image 
with  geocoding  images. 

Calibration  of  Misalignment  Angle 

For  misalignment  determination,  one  image  was  selected  near  the  GPS  base  station, 
and  seven  well-distributed  object  space  positions  were  collected  using  GPS.  Those  data 
were  pointed  out  on  the  selected  image  frame  manually.  After  angle  analysis  for  the  UF 
system,  misalignment  angle  between  the  IMU  and  the  laser  mirror  can  be  applied  to 
check  pixel  offset  between  GPS  data  and  the  calculation  from  section  4.2  on  the  image 
frame.  The  calculated  photo-coordinates  from  the  collinearity  equations  are,  of  course, 
not  matched  with  GPS  data  position  on  the  image  frame,  but  it  shows  pixel  offset 
between  GPS  data  and  calculation  in  x and  y photo-coordinates.  Since  the  offset  for 
0)  and  (f)  depends  on  focal  length  and  chip  size,  one  pixel  shift  on  the  image  can  be 
expressed  as  a function  of  misalignment  angle,  as  shown  in  Equation  B-9. 


w 


(B-8) 
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D («and  = tan 


/ 


(B-9) 


where  Cs  is  the  size  of  CCD  chip  and /is  focal  length.  And  k is  the  function  of  chip  size 
and  the  distance  between  the  principal  point  and  the  object  point  on  the  image  frame. 


D (/£:)  = tan  ^ 


(B-10) 


where  is  the  distance  between  the  principal  point  and  the  object  point  on  the  image 
frame. 

After  processing  Posproc  with  positional  offset  and  misalignment  angle  between  the 
camera,  and  the  IMU  and  positional  offset  between  the  camera  and  GPS,  the  trajectory 
file  for  position  (X,  Y,  and  Z)  and  INS  file  for  attitude  (roll,  pitch,  heading)  of  the  camera 
lens  are  created.  These  six  parameters  at  each  camera  exposure  time  are  used  as  exterior 
orientation  parameters. 

For  misalignment  calibration,  a tennis  court  on  the  campus  of  the  University  of  Florida 
was  chosen  as  the  calibration  area,  and  eight  selected  points  were  surveyed  by  GPS  (see 
Figure  B-9).  Using  equation  B-9  and  B-10,  misalignment  angles  were  calculated  and 
applied  to  back-projection  processing  for  direct  image  georeferencing.  To  bring  the 
calculated  and  surveyed  data  into  agreement,  the  misalignment  angles  in  Table  B-3  were 
applied. 

For  accuracy  assessment,  the  coordinates  of  eight  points  corresponding  to  GPS 
surveying  points  were  carefully  selected  and  compared  to  the  surveyed  GPS  points  (see 
Table  B-4).  At  point  number  6 and  8,  the  GPS  data  collection  had  been  obstructed  by  big 
trees  and  had  higher  differences,  but  all  points  were  below  1.5  pixels.  The  offsets  in 
Tables  B-4  seem  to  indicate  a systematic  error  without  the  point  8 which  could 
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be  an  outlier,  but  our  data  fusion  resolution  is  within  the  toleranee  shown  in  the  table, 
including  point  8.  With  the  determination  of  the  positional  offset  and  misalignment  angle 
for  the  perspective  center  of  the  camera,  the  calculated  exterior  orientation  parameters 
were  applied  to  the  back-projection  with  the  ALSM  laser  points  as  GCPs  (see  Figure  B- 
10). 


Table  B-3.  The  Offset  Between  Sensors  and  IMU  Misalignment 


Laser  to  IMU 

Camera  to  IMU 

Roll  (deg) 

0.000 

-0.371 

Pitch  (deg) 

0.000 

-0.078 

Heading  (deg) 

0.200 

-0.310 

Figure  B-9.  Eight  ground  points  for  misalignment  calibration. 
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Table  B-4.  Offset  Between  GPS  Data  and  Calculation  after  Applying  Subsystem 
Parameters 


Ground  Surveyed  Data 

Georeferenced  Image 

Offsets 

PT. 

Easting 

Northing 

Easting 

Northing 

AE 

AN 

1 

369030.2801 

3280275.028 

369030.269 

3280275.197 

0.011604  -0.1687 

2 

369033.2117 

3280335.276 

369033.129 

3280335.393 

0.082572  -0.11734 

3 

369059.1988 

3280273.606 

369059.054 

3280273.782 

0.144377  -0.17579 

4 

369062.2034 

3280333.898 

369062.006 

3280334.059 

0.197031  -0.16159 

5 

369091.8829 

3280272.034 

369091.658 

3280272.082 

0.225055  -0.0476 

6 

369094.8437 

3280332.327 

369094.554 

3280332.414 

0.289816  -0.08639 

7 

369055.1745 

3280365.716 

369055.019 

3280365.796 

0.15601 

-0.07985 

8 

369089.1813 

3280363.942 

369089.448 

3280363.92 

-0.26709  0.022455 

Study  Area  and  Data  Acquisition 

A test  site  was  established  on  the  UF  campus  on  October  10,  2000.  Five  strips  of 
images  were  captured  over  the  area  with  a ground  pixel  size  of  about  0. 19m  at  a 600m 
average  flying  height.  Two  base  stations  consisting  of  Ashtech  Z-12  GPS  receivers  were 
used  to  facilitate  the  differential  processing  of  GPS  data.  The  GPS  data  were  collected  at 
1 Hz  and  the  IMU  rate  was  50  Hz.  The  focal  length  of  the  camera  was  28.45mm  and  its 
CCD  size  was  1524*1012.  The  size  of  the  CCD  elements  is  X:  8.93  and  Y:  8.95mm. 

A test  image  taken  at  419908.4873696  seconds  GPS  time  was  selected  and,  at  that  GPS 
time,  the  position  of  the  perspective  center  was  Yci  369599.821m,  Yc.  3280510.774  m, 
and  Zc'.  619.980m  in  zone  17  at  UTM-WGS84,  and  the  attitude  was  or.  3.30944  degrees, 
(fr.  1.29104  degrees,  and  k.  86.98086  degrees.  In  the  selected  image  frame,  34,005  ALSM 
data  points  were  collected  but  only  55  were  selected  as  GCPs  after  filtering  for  bare 
ground  data.  The  GCPWorks  program  (PCI  Geomatics)  was  used  to  process  the 
georeferenced  image  and  a polynomial  model  was  applied.  The  input  data  were  photo 
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coordinates  from  the  collinearity  equations  and  the  corresponding  ground  control  points 
from  the  ALSM  data. 


Figure  B-1 1 shows  55  GCPs  from  the  ALSM  on  a selected  digital  image.  The  RMSE 
of  points  on  the  image  was  X:  1 .20  pixels  and  Y:  1 .02  pixels,  and  the  resulting 
georeferenced  image  was  created  using  a cubic  convolution  resampling  method.  We  have 
called  the  resulting  georeferenced  image  a pseudo-orthorectified  image  (PORI)  because 
we  used  only  GCPs  at  the  ground  level.  Consequently,  in  the  PORI,  some  elevated 
objects  are  not  in  their  correct  planimetric  position. 

For  accuracy  assessment,  seven  well-distinguished,  ground-level  points  in  the  PORI 
were  selected  and  a GPS  ground  survey  was  taken  for  these  points.  Table  B-5  shows  the 
difference  between  GPS  ground  survey  coordinates  and  the  coordinates  on  the  PORI  that 
are  in  zone  17  at  UTM-WGS  84.  The  distance  unit  is  meters.  The  GPS  data  were 


Results 


z 


« • Phole  i>oint 


Figure  B-10.  Back  projection  from  the  ALSM  laser  data  to  perspective  center. 
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Figure  B-1 1.  ALSM  laser  points  as  GCP  on  a digital  image. 

collected  for  about  45  minutes,  and  six  to  eight  satellites  were  locked  during  surveying. 
The  surveyed  GPS  data  points  reasonably  matched  the  points  of  the  PORI.  The  ground 
resolution  of  a pixel  is  about  20  cm  and  all  test  points  are  below  2 pixels  difference  at 
619.980m  flying  height.  Figure  B-12  shows  a pseudo-orthorectified  image  by  direct 
georeferencing  and  seven  test  points  for  comparison  between  GPS  ground  survey  and  the 
georeferenced  image. 

From  this  test,  the  scale  factor  should  be  investigated  as  the  next  step.  Further  tests  are 
to  be  carried  out  to  explore  a number  of  other  calibration  issues.  The  correlation  between 
parameters  needs  further  investigation  along  with  the  calibration  of  other  images.  Further 
investigations  will  take  place  into  reducing  the  number  of  control  points  for  the 
calibration  to  find  the  economical  optimum  number. 
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Figure  B-12.  Georeferenced  image  by  direct  image  georeferencing  processing. 
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Table  B-5.  The  Comparison  Between  GPS  Ground  Survey  Data  and  the  Same  Points  on 
the  Pseudo-orthorectified  Image  


Ground  Surveyed  Data 

Georeferenced  Image 

Offsets 

Pt 

1 

Easting 

369812.895 

Northing 

3280613.511 

Easting 

369812.509 

Northing 

3280613.111 

AE 

0.386 

AN 

0.400 

2 

369763.44 

3280614.232 

369763.560 

3280613.857 

-0.120 

0.375 

3 

369678.672 

3280586.594 

369678.778 

3280586.450 

-0.106 

0.144 

4 

369689.699 

3280445.757 

369689.442 

3280446.107 

0.257 

-0.350 

5 

369741.582 

3280464.204 

369741.484 

3280464.236 

0.098 

-0.032 

6 

369795.64 

3280500.725 

369795.446 

3280500.602 

0.194 

0.123 

7 

369829.094 

3280502.489 

369828.932 

3280502.628 

0.161 

-0.139 

Discussion 

In  this  research,  direct  digital  image  georeferencing,  combined  with  the  integrated 
GPS/INS,  a digital  camera,  and  ALSM  laser  data,  was  established  for  testing  the 
possibility  of  replacing  a traditional  photogrammetric  solution.  The  concept  was  that  the 
ALSM  laser  data  can  be  used  as  ground  control  points  for  the  collinearity  equations  to 
calculate  photo  coordinates  of  images  with  position  and  attitude  information  from  the 
integrated  GPS/INS.  The  test  results  show  that  digital  color  pseudo-orthorectified  images 
were  successfully  created.  The  comparison  between  GPS  ground  points  and  related  points 
on  the  PORI  shows  that  most  points  have  only  1 pixel  offset  and  one  point  has  about  2- 
pixel  offset.  This  may  be  caused  by  a lens  distortion  of  the  camera  system.  These  results 
should  be  repeated  with  a calibrated  camera. 

The  method  or  algorithm  for  calibrating  the  misalignment  angle  needs  to  be  developed 
to  avoid  using  aerotriangulation  adjustment  for  fast  data  processing  and  more  accurate 
image  georeferencing. 


APPENDIX  C 

SPECIFICS  OF  INTENSITY  DATA  FROM  ALSM  SYSTEM 


Intensity  of  Reflected  Laser  Pulse 

The  ALSM  systems  with  intensity  return  enable  the  user  to  simultaneously  see  the 
features,  like  black  and  white  photography,  and  gives  visual  help  to  understand  surveyed 
areas. 

Intensity  is  the  reflectance  value  by  relation  between  transmitted  and  received  power. 
Let  us  assume  that  the  laser  footprint  completely  covers  the  entire  target  area  that  reflects 
the  beam  back  to  the  source  and  the  reflected  power  radiates  uniformly  into  a 
hemisphere. 

Based  on  the  equations  by  Baltsavias  (1999)  for  the  relation  between  transmitted  and 
received  laser  pulse,  the  total  power  reflected  from  the  target  is: 


tar 


=P 


k{D  + RyY 


(C-1) 


where 

p = reflectivity  of  target, 

A^„  (m2)  = target  area, 

Otar  ^ power  density  within  an  illuminated, 

P-i  = Transmitted  power, 

M =Atmospheric  transmission, 

Z>tar=  diameter  of  target  object, 

R = range  (distance  between  sensor  and  object). 
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y (mrad)  = laser  beam  divergence  (IFOV). 

The  power  reflected  from  the  target,  Prefi,  that  is  collected  by  the  receiver  optics,  with  M 
the  atmospheric  transmission  and  Ax  the  receiver  area,  is  power  collected  by  the  receiver, 
Pr- 


P =P  ^M  — 


(C-2) 


(C-3) 


where  Dr  - Diameter  of  receiving  optics.  Combining  the  above  equation  yields:  (with 
assuming  Ai  = A tar) 


(C.4) 


The  above  equation  can  also  be  used  with  the  transmitted  and  received  energy  per  pulse 
instead  of  the  power.  In  the  pulse  laser  1064  nm  wavelength  at  the  ALSM  system, 
intensity  can  be  expressed  in  the  form  shown  in  Equation  C-5. 


r r. 

Intensity  = :A~Pt 


nR 


2 " T 


(C-5) 


This  is  an  approximation  by  Baltsavias  (1999)  since  there  are  additional  losses.  For 
example,  due  to  the  optical  transmission  of  the  transmitter  and  receiver  optics,  and  the 
narrow  bandpass  filter  at  the  receiver  to  exclude  background  radiation. 

The  returning  intensity  depends  on  the  reflectance  of  the  surface.  At  1064nm 


wavelength,  sand,  water,  concrete,  and  vegetation  exhibit  high  reflectance.  Dark  surfaces, 
such  as  asphalt,  wet  surfaces,  and  mud  have  much  lower  reflectivity.  The  intensity  is  also 


126 


affected  by  the  surface  texture,  the  distance  from  the  aircraft  to  the  surface,  and  the 
orientation  of  the  surface  to  the  receiver. 

Correlation  Between  ALSM  Intensity  and  Target  Reflectance 
When  discussing  laser  wavelength,  one  should  also  consider  the  backscattering 
properties  of  the  target.  For  example,  the  object  surface.  Figure  C-1  is  a digital  color 
image  taken  by  the  UF  ADP  system.  It  is  a coastal  scene  near  St.  Augustine,  Florida, 
which  includes  grasses,  concrete  sidewalk,  asphalt  road,  sand,  and  ocean.  In  Figure  C-2, 
we  show  the  ALSM  laser  data  of  the  same  area.  The  ALSM  points  were  manually 
divided  into  six  land-cover  groups,  as  shown  in  Table  C-1. 

Since  the  ALSM  data  has  height  and  intensity  information  at  each  X and  Y coordinate, 
some  ground  objects  can  be  classified  using  the  correlation  between  height  and  intensity. 
In  Figure  C-3,  we  show  the  correlation  between  the  ALSM  intensity  and  height. 
Examination  of  Figure  C-3  shows  that  for  this  scene,  buildings,  asphalt,  and  ocean  areas 


Figure  C-1.  Test  area  for  checking  reflectance  of  the  wavelength  of  the  ALSM  laser 
intensity. 
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Table  C-1.  Classification  of  ALSM  Data  Group 


Raw  ALSM  data  group 

Classification 

A 

Ocean 

B 

Asphalt 

C 

Building  1 

D 

Concrete  building 

E 

Concrete  + Grass 

F 

Grass  + Sand 

Figure  C-2.  Classification  by  correlation  of  ALSM  Elevation  and  Intensity. 


can  be  easily  classified,  but  groups  E and  F,  which  consist  of  mixtures  of  concrete  and 
grass,  and  grass  and  sand,  could  not  be  separated  to  one  class. 

To  check  reflectance  of  the  wavelength  of  the  ALSM  laser  intensity  that  has  1064nm, 
the  AVIRIS  hyperspectral  data  for  sand,  asphalt,  grass  and  concrete  are  shown  at  Figure 
C-4.  Although  asphalt  has  low  reflectance,  just  as  it  has  a lower  intensity  value,  the 
reflectance  of  grass,  sand,  and  concrete  is  too  close  to  distinguish  each  object  using  just 


reflectance. 


Reflectan  ce 
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Figure  C-3.  Correlation  of  the  ALSM  height  data  to  intensity  data. 
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Figure  C-4.  Correlation  between  reflectance  and  wavelength  of  the  AVIRIS  hyperspectral 
data. 
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The  reflectivity  of  a target  for  a given  wavelength  also  influences  the  maximum  range. 
Thus,  manufacturers  and  system  providers’  specifications  for  the  maximum  range  should 
always  specify  type  of  target  reflectivity  (Wehr  and  Lohr,  1999). 


APPENDIX  D 

GAUSSIAN  MAXIMUM  LIKELIHOOD  CLASSIFICATION 


The  maximum  likelihood  classification  quantitatively  evaluates  both  the  variance  and 
covariance  of  the  category  spectral  response  patterns  when  classifying  an  unknown  pixel, 
and  it  calculates  the  probability  that  a given  pixel  belongs  to  a specific  class.  To  do  this,  it 
is  assumed  that  the  statistics  for  each  class  in  each  band  are  normally  distributed 
(Lillesand  & kiefer,  1994).  Each  pixel  is  assigned  to  the  class  that  has  the  highest 
probability  (that  is,  the  “maximum  likelihood”).  For  the  technical  detail,  we  followed  the 
references  provided  by  Jensen  (1996)  and  Richards  and  Jia  (1999). 

Suppose  we  have  training  sites  on  six-band  images,  then  each  pixel  in  each  training 
site  can  be  represented  by  a measurement  vector,  Xc,  such  that 


X = 


BKj, 

BV,, 

BV,, 

BV,, 


BV, 


y*  J 


where  BVijk  is  the  brightness  value  for  the  ijth  pixel  in  band  k.  The  brightness  values  for 
each  pixel  in  each  band  in  each  training  class  can  then  be  analyzed  statistically  to  yield  a 
mean  measurement  vector.  Me,  for  each  class: 
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/^cl 


/^c2 


M = 


Mel 

McA 


IMck 


where  represents  the  mean  value  of  the  data  obtained  for  class  c in  band  k.  The  raw 
measurement  vector  can  also  be  analyzed  to  yield  the  covariance  matrix  for  each  class  c: 


Cov,„ 

Cov„2  • 

C0V^21 

Cov,22  • 

••  Cov,2„ 

CoV,„2  • 

• • Cov 

enn 

where  CoVc«  is  the  covariance  of  class  c between  bands  k through  1.  For  brevity,  the 
notation  for  the  covariance  matrix  for  class  c (i.e.,  Veu)  will  be  shortened  to  just  Vc. 

Given  these  parameters,  we  may  compute  the  statistical  probability  of  each  pixel  value 
being  a member  of  a particular  land  cover  class.  The  maximum  likelihood  classification 
is  developed  in  a statistically  acceptable  manner  as  follows: 

Let  the  spectral  classes  for  an  image  be  represented  by 

Q,  /■-  1, ...  N 

where  N is  the  total  number  of  classes.  In  an  attempt  to  determine  the  class  to  which  a 
pixel  at  a location  x belongs,  it  is  strictly  the  conditional  probabilities 

p{Ci\x\i  = 1,  ...N 

The  position  vectors  is  a column  vector  of  brightness  values  for  the  pixel.  It  describes 
that  the  pixel  is  a point  in  multispectral  space  with  coordinates  defined  by  the  brightness. 
The  probability  p (c,[x)  provides  the  likelihood  that  the  correct  class  is  c,  for  a pixel  at 
position  X.  The  classification  is  performed  according  to 
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X e d if  p (c,[t)  > p {cj\x)  for  all  j ^ i (D-1) 

It  means  that  the  pixel  at  x will  belong  to  class  c^,  if p (c,[x)  is  the  largest. 

This  concept  is  very  simple.  The  problem  is  that  we  do  not  know  the  p (c,|x)  in  (D-1). 
However,  suppose  that  sufficient  training  data  are  available  for  each  class,  which  can  be 
used  to  estimate  a probability  distribution  for  a class  that  describes  the  chance  of  finding 
a pixel  from  class  c,  at  the  position  x.  It  can  be  represented  by  the  symbol  p (x|c,).  There 
will  be  as  many  p (x|c,)  as  there  are  land  cover  classes.  In  other  words,  for  a pixel  at  a 
position  X in  multispectral  space,  a set  of  probabilities  can  be  computed  that  give  the 
relative  likelihoods  that  the  pixel  belongs  to  each  available  class.  Figure  D-1  shows  equi- 
probability  contours  in  the  scatter  diagram.  The  shape  of  equiproability  contours 
expresses  the  sensitivity  of  a likelihood  classifier  to  covariance.  For  example,  because  of 
this  sensitivity,  it  can  be  seen  that  pixel  1 would  be  approximately  assigned  to  the 
vegetation  class  (Lillesand  & kiefer,  1994). 


Figure  D-1.  Equiprobability  contours  defined  by  a maximum  likelihood  classifier. 
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We  can  describe  the  relation  between  the  desired  p(cj[x)  and  the  available  p{x\  c,)  — 
estimated  from  training  data  - by  Bayes’s  theorem  (Freund,  1992); 

P{ci\x)  - p{x\ci)  p{c,)  / p{x)  (D-2) 

where  /?(c,)  is  the  probability  that  class  c,  occurs  in  the  image.  p{x)  in  (D-2)  is  the 
probability  of  finding  a pixel  from  any  class  at  location  x.  Although  p{x)  itself  is  not 
important  in  the  following  equation,  it  can  be  presented  as 

M 

pW=S  PiAcdpicd, 

1=1 

The  p{ci\x)  are  called  posterior  probabilities,  and  the  p(c,)  are  called  a priori  probabilities 
since  they  are  the  probabilities  with  which  class  membership  of  a pixel  could  be  guessed 
before  classification.  Using  (D-2),  after  the  removal  of  p{x)  as  a common  factor,  the 
classification  rule  of  (D-1)  can  be  restated  as: 

X € c,  if  p{x\  Ci)  p(ci)  > p{x\  Cj)  p{cj)  for  all  j ^ i (D-3) 

The  classification  rule  of  (D-3)  is  more  acceptable  than  that  of  (D-1)  because  the  p{x\  c,) 
are  known  from  training  data,  and  the  p{ci)  are  also  known  or  can  be  estimated  from  the 
analyst’s  knowledge  of  the  image.  Mathematical  convenience  results  if  in  (D-3)  the 
definition 

fix)  = loge  { p(x\  c,)p{ci)} 

= loge  p{x\  d)  + loge  p{ci)  (D-4) 

is  used,  so  that  (D-3)  is  restated  as 

X € Ci  if  fix)  > fj(x)  for  all  j ^ i (D-5) 

This  is,  with  one  modification  to  follow,  the  decision  rule  used  in  the  maximum 
likelihood  classification  and  the/;(x)  are  referred  to  as  discriminant  functions. 
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For  the  mathematical  simplications  in  the  following  equations,  an  assumption  is  made 
that  the  probability  distributions  for  the  classes  are  in  the  form  of  multivariate  normal 
models.  In  (D-4),  therefore,  the  discriminant  function  for  the  maximum  likelihood 
classification,  based  upon  the  assumption  of  normal  statistics,  can  be  represented  as 
follows: 

fi  (x)  = \ogeP{ci)  - 0.5  loge  | Vc\  ~ [0.5(x  - Mcf{Vc'\x  - Me)]  (D-6) 

Often  the  analyst  has  no  useful  information  about  the  p{c,),  in  which  case  as  the 
situation  of  equal  prior  probabilities  is  assumed,  and  log^  /?(c,)  can  be  removed  from  (D- 
6).  In  that  case,  the  0.5  common  factors  can  also  be  removed  leaving  as  the  discriminant 
function: 

fiix)  = - loge  \Vc\-{x-  Mcf{ Vc' ' )(x  - Me)  (D-7) 

The  maximum  likelihood  decision  rule  can  be  implemented  using  (D-6)  or  (D-7)  in 
(D-5).  Therefore,  to  classify  the  pixel  at  x into  an  unknown  class,  the  maximum 
likelihood  decision  rule  computes  the  value  J}{x)  for  each  class.  Then  the  decision  rule 
assigns  the  pixel  to  the  class  that  has  the  largest  (or  maximum)  value. 


APPENDIX  E 

TECHNIQUE  FOR  ASSESSING  THE  ACCURACY  OF  CLASSIFICATIONS 

One  of  the  most  common  methods  for  quantitatively  assessing  classification  accuracy 
is  a classification  error  matrix,  which  is  sometimes  called  a confusion  matrix. 

Here,  we  summarize  the  accuracy  assessment  of  the  remotely  sensed  data  and  error 
matrix  provided  by  Jensen  (1996)  and  Congalton  and  Green  (1999). 

Sample  Design 

To  obtain  unbiased  ground  reference  information  to  compare  with  the  remote  sensing 
classification  map  and  fill  the  error  matrix  with  values,  the  selection  of  a proper  and 
efficient  sample  design  is  one  of  the  most  important  components  of  an  accuracy 
assessment.  We  considered  several  critical  issues  to  designing  an  accuracy  assessment 
sample  that  is  truly  representative  of  the  map:  (1)  the  appropriate  sample  unit,  (2)  the 
total  number  of  samples  to  be  collected  by  category,  and  (3)  the  selection  of  the  samples. 
Sample  Unit 

Sample  units  have  three  choices:  a single  pixel,  a cluster  of  pixels,  and  polygons.  A 
single  pixel,  historically,  has  been  a poor  choice  because  it  is  an  arbitrary  rectangular 
delineation  of  the  landscape  that  may  have  little  relation  to  the  actual  delineation  of  land 
cover  type.  A cluster  of  pixels,  typically  a 3 x 3 box,  has  been  the  most  common  choice 
for  the  sample  unit,  but  it  may  still  be  an  arbitrary  delineation  of  the  landscape,  resulting 
in  the  sample  unit  encompassing  more  than  one  map  category.  It  is  important  to 
remember  that  the  sample  units  are  the  portions  of  the  map  that  will  be  sampled  for 
accuracy  assessment.  If  the  assessment  is  performed  on  a cluster  of  pixels,  nothing  can  be 
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said  either  about  a single  pixel  or  about  polygons.  More  mapping  projects  from  remotely 
sensed  data  are  generating  polygon  products.  If  the  objective  of  the  mapping  is  to 
produce  a polygon  map,  it  is  important  that  the  assessment  is  conducted  on  the  polygon 
basis.  Therefore,  the  polygon  is  replacing  the  cluster  of  pixels  as  the  sample  unit  of 
choice,  and  it  was  also  applied  for  this  research. 

Sample  Size 

The  adequate  number  of  samples  for  accuracy  assessment  of  individual  categories  is 
often  difficult  to  determine.  A number  of  researchers  have  used  an  equation  based  on  the 
binomial  and  distribution  or  the  normal  approximation  to  the  binomial  distribution  to 
compute  the  required  sample  size.  Congalton  (1996)  showed  the  equations  to  calculate 
sample  size  and  its  example.  Though  this  method  is  acceptable  for  selecting  the  total 
number  of  pixels  to  be  sampled,  it  was  not  designed  to  select  a sample  size  for  filling  an 
error  matrix.  Congalton  (1999)  suggested  that  a good  rule  is  to  collect  a minimum  of  50 
samples  for  each  land-cover  class  in  the  error  matrix.  If  the  area  is  especially  large  (that 
is,  more  than  1 million  acres)  or  the  classification  has  a large  number  of  land  use 
categories  (that  is,  more  than  12  classes),  the  minimum  number  of  samples  should  be 
increased  to  75  or  100  samples  per  class.  The  number  of  samples  can  also  be  adjusted 
based  on  the  relative  importance  of  that  category  within  the  objectives  of  the  mapping. 
Using  this  logic,  at  least  400  reference  sites  were  necessary  to  assess  the  accuracy  of 
classification  map  in  this  research  (that  is,  eight  classes  at  50  pixels  each). 

Sampling  Selection 

The  choice  of  samples  is  critical  to  generating  an  error  matrix  that  is  representative  of 
the  entire  map.  The  common  five  sampling  schemes  have  been  applied  for  collecting 
reference  data:  (1)  simple  random  sampling,  (2)  systematic  sampling,  (3)  stratified 
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systematic  unaligned  sampling,  (4)  stratified  random  sampling,  and  (5)  cluster  sampling. 
In  a simple  random  sampling,  the  good  statistical  properties  are  the  main  advantage  that 
result  from  the  random  selection  of  samples.  However,  random  sampling  may  skip  small 
but  possibly  very  important  classes  unless  the  sample  size  is  large  enough.  Systematic 
sampling  selects  the  sample  units  at  some  equal  interval  over  the  study  area.  The  first 
sample  is  randomly  selected  and  the  successive  samples  are  taken  at  some  specific 
interval  thereafter.  Stratified  systematic  unaligned  sampling  is  a combined  approach  that 
uses  the  advantages  of  randomness  and  stratification  with  the  ease  of  a systematic  sample 
without  falling  into  the  traps  of  periodicity  common  to  systematic  sampling. 

Many  researchers  prefer  stratified  random  sampling  where  a minimum  number  of 
samples  are  selected  from  strata  (that  is,  land-cover  category).  The  major  advantage  of 
stratified  random  sampling  is  that  all  strata  will  be  included  in  the  sample,  no  matter  how 
small  the  sample  is.  However,  stratified  random  sampling  can  be  selected  only  after  the 
map  has  been  completed  because  it  needs  to  know  the  location  of  the  strata  to  be  used.  In 
addition  to  the  sampling  schemes  already  discussed,  cluster  sampling  has  also  been 
frequently  used,  especially  to  collect  information  on  many  samples  quickly.  However,  the 
use  of  very  large  clusters  is  not  a valid  method  of  collecting  data  because  pixels  are 
dependent  on  each  other  and  add  very  little  information  to  the  cluster.  Congalton  (1999) 
recommended  that  no  clusters  larger  than  10  pixels  and  certainly  no  larger  than  25  pixels 
be  used  because  of  the  lack  of  information  added  by  each  pixel  beyond  these  cluster 
sizes. 

In  this  research,  the  reference  data  were  collected  based  on  the  color  image  and  ground 
identification  without  GPS  measurements.  We  tried  to  collect  almost  all  variations  in 
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each  category.  For  example,  we  collected  all  different  asphalt  materials  in  the  asphalt 
road  category.  We  applied  cluster  sampling  with  polygon  sample  units,  because  we  want 
to  make  polygons  for  some  important  objects  that  random  selection  can  be  missed.  Each 
polygon  had  10  to  25  pixels  and  the  polygon  had  not  specific  shape.  The  reference  data 
points  are  shown  in  Figure  E-1  and  their  sample  size  in  each  class  is  shown  in  Table  E-1 . 

Evaluation  of  Error  Matrices 

After  the  reference  information  has  been  collected,  the  error  matrix  is  performed  and 
evaluated.  An  error  matrix  compares  two  sources  of  information:  (1)  classification  map 
and  (2)  known  reference  data  (ground  truth).  An  error  matrix  is  a square  array  of  numbers 
laid  out  in  rows  and  columns  that  express  the  number  of  the  sample  units  (that  is,  pixels. 


Figure  E-1.  The  reference  data  in  each  class. 
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Table  E- 


■ The  Sample  Polygon  in  Each  Class 


Class 

Number  of  Polygon 

Number  of  Pixel 

Building 

35 

620 

Tree 

47 

788 

Grass 

36 

694 

Asphalt  Road 

30 

304 

Concrete  Sidewalk 

41 

798 

Median  Brick 

12 

158 

Car 

16 

198 

Metal  on  the  Roof 

7 

102 

Total 

224 

3662 

clusters  of  pixels,  or  polygons).  These  sample  units  are  assigned  to  a particular  category 
in  one  classification  relative  to  the  number  of  sample  unit  assigned  to  a particular 
category  in  another  classification.  The  columns  usually  represent  the  reference  data,  and 
the  rows  indicate  the  classification  generated  from  the  remotely  sensed  data.  An  error 
matrix  is  a very  effective  way  to  represent  map  accuracy  because  the  individual 
accuracies  of  each  category  are  plainly  described,  along  with  both  the  errors  of  inclusion 
(commission  errors)  and  errors  of  exclusion  (omission  errors).  A commission  error  is 
defined  as  including  an  area  in  an  incorrect  category.  An  omission  error  is  excluding  an 
area  from  the  category  in  which  it  belongs.  Every  error  is  an  omission  and  a commission 
error. 

In  addition  to  showing  errors  of  omission  and  commission,  the  error  matrix  can  be 
used  to  compute  overall  accuracy,  the  producer’s  accuracy,  and  the  user’s  accuracy. 
Overall  accuracy  is  computed  by  dividing  the  sum  of  the  major  diagonal  (that  is,  the 
correctly  classified  pixels)  by  the  total  number  of  pixels  in  the  entire  error  matrix.  This 
value  is  the  most  commonly  reported  statistic  in  accuracy  assessment.  However,  it  is  also 
important  to  represent  the  accuracy  of  individual  categories,  and  the  producer’s  accuracy 
and  the  user’s  accuracy  are  ways  of  computing  individual  category  accuracies.  The 
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producer’s  accuracy  is  performed  by  dividing  the  total  number  of  correct  pixels  in  a 
category  (class)  by  the  total  number  of  pixels  of  that  category,  as  derived  from  the 
reference  data.  This  accuracy  shows  how  well  the  producer  (the  analyst)  classified  a 
certain  area.  The  user’s  accuracy  is  performed  by  dividing  the  total  number  of  correct 
pixels  in  a category  by  the  total  number  of  pixels  that  was  actually  classified  in  that 
category.  The  user’s  accuracy  is  the  probability  that  a pixel  classified  on  the  map  actually 
represents  that  category  (Story  & Congalton,  1986). 

Here,  we  give  an  example  to  help  understand  the  accuracies.  An  overall  accuracy  in 
the  error  matrix  shown  in  Table  E-2  is  72.95%.  However,  suppose  we  are  interested  in 
the  ability  to  classify  a tree,  so  we  calculate  a “producer’s  accuracy”  for  this  category, 
that  is,  dividing  the  total  number  of  correct  pixels  in  the  tree  class  (that  is,  65)  by  the  total 
number  of  trees,  as  indicated  by  the  reference  data  (that  is,  75,  the  column  total).  This 
division  results  in  a producer’s  accuracy  of  86.66%,  which  is  quite  good.  However,  the 
calculation  of  the  “user’s  accuracy”  for  this  category,  that  is,  dividing  the  total  number  of 
correct  pixels  in  the  tree  class  (that  is,  65)  by  the  total  number  of  pixels  classified  as  tree 
(that  is,  1 15  or  the  row  total)  reveals  a value  of  56.52%.  Although  86.66%  of  the  trees 
have  been  correctly  identified  as  trees  in  classification,  only  56.52%  of  the  trees  on  the 
classification  map  are  actually  trees  on  the  ground.  A more  careful  look  at  the  error 
matrix  helps  to  avoid  a serious  mistake. 

Kappa  Analysis 

The  Kappa  analysis  is  a discrete  multivariate  technique  of  use  in  accuracy  assessment 
for  statistically  determining  if  one  error  matrix  is  significantly  different  than  another 
(Congalton  & Mead,  1983).  The  result  of  performing  a Kappa  analysis  is  a Khat 
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Table  E-2.  Example  Error  Matrix 


Overal 


Class 

Tree 

Building 

Grass 

Road 

Total 

Tree 

65 

4 

22 

24 

115 

Building 

6 

80 

7 

6 

99 

Grass 

0 

13 

85 

21 

119 

Road 

4 

8 

4 

91 

107 

Total 

75 

105 

118 

142 

440 

Accuracy  = (65+80+85+91)7440  = 72.95  % 


Class 

Producer’s  Accuracy 

User’s  Accuracy 

Tree 

65/75 

= 86.66  % 

65/115 

= 56.52  % 

Building 

80/105 

-76.19% 

80/99 

-80.8  % 

Grass 

85/118 

= 72.03  % 

85/119 

-71.43% 

Road 

91/142 

= 64.08  % 

91/107 

- 85.05  % 

statistic  (an  estimate  of  the  Kappa)  that  is  a measure  of  agreement  of  accuracy 
(Rosenfield  & Fitzpatrick-Lins,  1986;  Congalton,  1999).  The  Kappa  analysis  has  become 
a standard  component  of  most  accuracy  assessment  since  Congalton  (1983)  published 
this  analysis  technique  in  a remote  sensing  journal.  This  measure  of  agreement  is  based 
on  the  difference  between  the  actual  agreement  in  the  error  matrix  (i.e.,  the  agreement 
between  the  remotely  sensed  classification  and  the  reference  data  as  indicated  by  the 
major  diagonal)  and  the  chance  agreement  which  is  indicated  by  the  row  and  column 
totals  (i.e.,  marginals).  The  Khat  statistic  is  computed  as 

k k 

_ 1=1 (=1 

^hat  - k 

(=1 

where  k is  the  number  of  rows  in  the  matrix,  is  the  number  of  samples  in  row  i,  and 
column  i,  and  «,+  and  «+,  are  the  marginal  totals  for  row  i and  column  /,  respectively,  and 
n is  the  total  number  of  samples. 
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